Top Image & Video AI repos on GitHub

All categories

Top GitHub Category

Image & Video AI

Generative image and video models and tools.

100Repos

200.1kStars

Ranked by stars

Showing 48 of 100

Anil-matcha/Open-Generative-AIJavaScript

Open-source alternative to AI video platforms — Free AI image & video generation studio with 200+ models (Flux, Midjourney, Kling, Sora, Veo). No content filters. Self-hosted, MIT licensed.

20.5k

EvoLinkAI/awesome-gpt-image-2-API-and-PromptsPython

GPT-Image-2 API and Prompts

16.9k

lucidrains/DALLE2-pytorchPython

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

11.3k

lucidrains/imagen-pytorchPython

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

8.4k

jamez-bondos/awesome-gpt4o-imagesJavaScript

Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capabilities.

8.1k

XavierXiao/Dreambooth-Stable-DiffusionJupyter Notebook

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

7.7k

promptslab/Awesome-Prompt-EngineeringTypeScript

This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc

6.1k

lucidrains/DALLE-pytorchPython

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

5.6k

HisMax/RedInkPython

Red Ink - A one-stop Xiaohongshu image-and-text generator based on the 🍌Nano Banana Pro🍌, "One Sentence, One Image: Generate Xiaohongshu Text and Images."

5.3k

lucidrains/deep-dazePython

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

4.3k

Lightricks/ComfyUI-LTXVideoPython

LTX-Video Support for ComfyUI

3.8k

Hunyuan-PromptEnhancer/PromptEnhancerPython

[CVPR 2026] PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.

3.7k

SamurAIGPT/Generative-Media-SkillsShell

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.

3.6k

kuprel/min-dallePython

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch

3.5k

filipecalegario/awesome-generative-ai

A curated list of Generative AI tools, works, models, and references

3.5k

YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy

Diffusion model papers, survey, and taxonomy

3.3k

wuyoscar/GPT-Image2-SkillPython

GPT Image 2 prompt gallery, image prompt library, agentic skill, and CLI for OpenAI image generation/editing

3.2k

ai-forever/Kandinsky-2Jupyter Notebook

Kandinsky 2 — multilingual text2image latent diffusion model

2.8k

saharmor/dalle-playgroundJavaScript

A playground to generate images from any text prompt using Stable Diffusion (past: using DALL-E Mini)

2.7k

FurkanGozukara/Stable-DiffusionHTML

FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI, Google Colab, RunPod, Kaggle, NoteBooks, ControlNet, TTS, Voice Cloning, AI, AI News, ML, ML News, News, Tech, Tech News, Kohya, Midjourney, RunPod

2.7k

bytedance/InfiniteYouPython

🔥 [ICCV 2025 Highlight] InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

2.7k

nerdyrodent/VQGAN-CLIPPython

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

2.6k

lucidrains/big-sleepPython

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

2.6k

Yutong-Zhou-cv/Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2.4k

Anil-matcha/awesome-generative-ai-appsJavaScript

50+ open-source generative AI apps you can clone, deploy, and monetize — image generators, video tools, virtual try-ons, AI SaaS templates, and platform integrations. One-click Vercel deploy on every template.

2.2k

llmsresearch/paperbananaPython

Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.

2.0k

carefree0910/carefree-creatorJupyter Notebook

AI magics meet Infinite draw board.

1.9k

YangLing0818/RPG-DiffusionMasterJupyter Notebook

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)

1.8k

zai-org/CogViewPython

Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".

1.8k

TencentARC/BrushNetPython

[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"

1.7k

omerbt/TokenFlowPython

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)

1.7k

ai-forever/ru-dalleJupyter Notebook

Generate images from texts. In Russian

1.6k

FoundationVision/InfinityPython

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

1.6k

MiniMax-AI/MiniMax-MCPPython

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

1.5k

fofr/cog-face-to-manyPython

Turn any face into a video game character, pixel art, claymation, 3D or toy

1.4k

bytedance/UNOPython

[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning

1.4k

Capsize-Games/airunnerPython

Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows

1.3k

PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

1.1k

zai-org/CogView4Python

CogView4, CogView3-Plus and CogView3(ECCV 2024)

1.1k

lukasHoel/text2roomPython

Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).

1.1k

foxhui/WebAI2APIJavaScript

WebAI2API: 基于 Camoufox 的网页 AI 转 API 工具，支持 LMArena/Gemini等，多窗口并发与账号隔离。 | Web AI to OpenAI API via Camoufox. Supports LMArena/Gemini and more, multi-window concurrency & account isolation.

1.1k

omerbt/MultiDiffusionJupyter Notebook

Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)

1.1k

iptag/jimeng-apiTypeScript

Reverse-engineered the official API for Jimeng/Dreamina’s text-to-image and image-to-image features. Drew inspiration from several experts’ projects and made some tweaks, which significantly improved stability.

1.0k

ddPn08/RadiataPython

Stable diffusion webui based on diffusers.

963

zai-org/CogView2Python

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

957

lucidrains/muse-maskgit-pytorchPython

Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch

919

finegrain-ai/refinersPython

A microframework on top of PyTorch with first-class citizen APIs for foundation model adaptation

833

haofanwang/Lora-for-DiffusersPython

The most easy-to-understand tutorial for using LoRA (Low-Rank Adaptation) within diffusers framework for AI Generation Researchers🔥

822

Other categories

AI Agents LLM Apps Prompts MCP Servers Awesome Lists AI Coding Tools Audio AI RAG & Vector Dev Frameworks