How to Create an AI Voiceover With Text to Speech in CapCut
Type a script and let CapCut's Text to Speech read it in a natural AI voice, then sync it to your footage.
Text to Speech turns a typed caption into spoken narration using an AI voice, which is perfect when you do not want to record yourself or your footage has no audio. You write the words once and CapCut generates a voice clip that you can position anywhere on the timeline.
What you need
- CapCut desktop or mobile, signed in
- A short script (one or two sentences per scene works best)
- Footage that the narration will play over
Step 1: Add your script as text
Click Text then Add text, and type the line you want spoken. The Text to Speech engine reads the contents of this text box, so write it exactly as you want it pronounced.
Step 2: Open Text to Speech
With the text box selected, find the Text to speech option in the right panel. Browse the voice library, which is grouped into categories like clear, warm, and character voices. Click a voice to hear a short sample.
Step 3: Generate the audio
Pick your voice and click Generate. CapCut creates a new audio clip on its own track, synced to the text box. You can now delete the visible text if you only wanted the voice, or keep it as an on-screen caption.
Step 4: Adjust pace and timing
Drag the generated audio clip so it lands on the right shot. If the voice talks faster or slower than your visuals, select the audio and change its speed slightly, or split your script into shorter boxes so each line matches one scene.
Result: a silent b-roll sequence now has clean narration that you typed in two minutes, with no microphone and no recording session.
Watch related tutorials
20:30
15:18
14:02
13:27
11:48
16:20