Video EditingBeginner

How to Create an AI Voiceover With Text to Speech in CapCut

Type a script and let CapCut's Text to Speech read it in a natural AI voice, then sync it to your footage.

6 minBeginner

Text to Speech turns a typed caption into spoken narration using an AI voice, which is perfect when you do not want to record yourself or your footage has no audio. You write the words once and CapCut generates a voice clip that you can position anywhere on the timeline.

What you need

  • CapCut desktop or mobile, signed in
  • A short script (one or two sentences per scene works best)
  • Footage that the narration will play over

Step 1: Add your script as text

Click Text then Add text, and type the line you want spoken. The Text to Speech engine reads the contents of this text box, so write it exactly as you want it pronounced.

CapCut - Text on timeline
Player | Edit text
--------------------------------+-------------------
Welcome to the channel, | [ Text to speech ]
today we build a desk. | Voice: choose...
|
T1 |### text box ###| |
A text box holds the script that will be voiced.

Step 2: Open Text to Speech

With the text box selected, find the Text to speech option in the right panel. Browse the voice library, which is grouped into categories like clear, warm, and character voices. Click a voice to hear a short sample.

Step 3: Generate the audio

Pick your voice and click Generate. CapCut creates a new audio clip on its own track, synced to the text box. You can now delete the visible text if you only wanted the voice, or keep it as an on-screen caption.

CapCut - Generated voice track
Timeline
------------------------------------------------
V1 |==== b_roll_desk_build.mp4 ==============|
T1 |### Welcome to the channel ###|
A2 |~~ AI voiceover (generated) ~~|

Step 4: Adjust pace and timing

Drag the generated audio clip so it lands on the right shot. If the voice talks faster or slower than your visuals, select the audio and change its speed slightly, or split your script into shorter boxes so each line matches one scene.

Write for the ear
AI voices read punctuation literally. Add commas where you want a breath, and spell tricky words phonetically (write 'Cap-Cut' if a name comes out wrong) to fix mispronunciations.

Result: a silent b-roll sequence now has clean narration that you typed in two minutes, with no microphone and no recording session.

Watch related tutorials

Tags
#text-to-speech#voiceover#ai-voice#narration