Top Audio AI repos on GitHub

All categories

Top GitHub Category

Audio AI

Speech, music and audio generation models.

100Repos

740.4kStars

Ranked by stars

Showing 48 of 100

unslothai/unslothPython

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

67.1k

RVC-Boss/GPT-SoVITSPython

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

59.0k

coqui-ai/TTSPython

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

45.6k

2noise/ChatTTSPython

A generative speech model for daily dialogue.

39.5k

babysor/MockingBirdPython

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

36.9k

myshell-ai/OpenVoicePython

Instant voice cloning by MIT and MyShell. Audio foundation model.

36.8k

OpenBMB/VoxCPMPython

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

31.3k

FunAudioLLM/CosyVoicePython

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

21.8k

index-tts/index-ttsPython

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

21.3k

nari-labs/diaPython

A TTS model capable of generating ultra-realistic dialogue in one pass.

19.3k

jianchang512/pyvideotransPython

Translate the video from one language to another and embed dubbing & subtitles.

18.1k

leon-ai/leonTypeScript

🧠 Leon is your open-source personal assistant.

17.3k

k2-fsa/sherpa-onnxC++

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages

13.1k

calesthio/OpenMontagePython

World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.

12.6k

supertone-inc/supertonicSwift

Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.

12.6k

rany2/edge-ttsPython

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

11.3k

rhasspy/piperC++

A fast, local neural text to speech system

11.1k

abus-aikorea/voice-proPython

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

11.0k

mozilla/TTSJupyter Notebook

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

10.2k

espnet/espnetPython

End-to-End Speech Processing Toolkit

9.9k

open-mmlab/AmphionPython

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

9.9k

netease-youdao/EmotiVoicePython

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

8.5k

Plachtaa/VALL-E-XPython

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

7.9k

jaywalnut310/vitsPython

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

7.9k

myshell-ai/MeloTTSPython

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

7.5k

debpalash/OmniVoice-StudioPython

The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App

7.4k

Blaizzy/mlx-audioPython

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

7.4k

espeak-ng/espeak-ngC

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

6.6k

yl4579/StyleTTS2Python

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

6.3k

argmaxinc/argmax-oss-swiftSwift

On-device Speech AI for Apple Silicon

6.2k

promptslab/Awesome-Prompt-EngineeringTypeScript

This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc

6.1k

snakers4/silero-modelsJupyter Notebook

Silero Models: pre-trained text-to-speech models made embarrassingly simple

6.0k

remsky/Kokoro-FastAPIPython

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/multiplatform CPU, AMD, NVIDIA GPU PyTorch support, handling, and auto-stitching

5.1k

denizsafak/abogenPython

Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

4.9k

MoonInTheRiver/DiffSingerPython

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

4.8k

gradio-app/fastrtcJavaScript

The python library for real-time communication

4.6k

dograh-hq/dograhPython

Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.

4.6k

metavoiceio/metavoice-srcPython

Foundational model for human-like, expressive TTS

4.2k

collabora/WhisperLivePython

A nearly-live implementation of OpenAI's Whisper.

4.1k

TensorSpeech/TensorFlowTTSPython

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

4.0k

KoljaB/RealtimeTTSPython

Converts text to speech in realtime

4.0k

OpenMOSS/MOSS-TTSPython

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.

3.6k

IAHispano/ApplioPython

A simple, high-quality voice conversion tool focused on ease of use and performance.

3.4k

rsxdalv/TTS-WebUITypeScript

A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!

3.2k

elevenlabs/elevenlabs-pythonPython

The official Python SDK for the ElevenLabs API.

3.0k

enhuiz/vall-ePython

An unofficial PyTorch implementation of the audio LM VALL-E

3.0k

readbeyond/aeneasPython

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

2.8k

Camb-ai/MARS5-TTSJupyter Notebook

MARS5 speech model (TTS) from CAMB.AI

2.8k

Other categories

AI Agents LLM Apps Prompts MCP Servers Awesome Lists AI Coding Tools Image & Video AI RAG & Vector Dev Frameworks