How to Auto-Generate Captions with Whisper and Upload an SRT
Transcribe your video locally with Whisper, produce a clean SRT, then attach it to the video through the captions endpoint.
YouTube auto-captions are hit or miss, especially with names and jargon. This guide transcribes your audio with OpenAI Whisper to get a clean SRT, then uploads that caption track to your video so viewers get accurate subtitles.
What you need
- ffmpeg installed (Whisper uses it to read media)
- Python 3.9+ and the openai-whisper package
- Your client_secret.json and the video id to caption
Step 1: Install Whisper and ffmpeg
Step 2: Transcribe to SRT
The Whisper command line writes several formats. The srt output is what YouTube wants. The small model is a good balance of speed and accuracy for spoken English.
Step 3: Upload the caption track
Use the captions.insert endpoint of the YouTube Data API, passing the video id and the SRT as the media body. Reuse the service() helper from the upload guide for authentication.
from googleapiclient.http import MediaFileUpload
from upload import service # reuse the auth helper
def add_caption(video_id, srt_path, language="en"):
body = {
"snippet": {
"videoId": video_id,
"language": language,
"name": "English (Whisper)",
"isDraft": False,
}
}
media = MediaFileUpload(srt_path, mimetype="application/octet-stream")
res = service().captions().insert(part="snippet", body=body, media_body=media).execute()
print("Caption track id:", res["id"])
if __name__ == "__main__":
add_caption("dQw4w9WgXcQ", "final.srt")Result
The video now shows an English (Whisper) caption track in the CC menu. Run the same flow per language by translating the SRT and changing the language code.
Watch related tutorials
08:30
10:15
12:00
11:20
20:00
07:00