Video EditingBeginner

How to Make a Faceless YouTube Short with AI

Build a vertical, faceless Short end to end with an AI script, a synthetic voiceover, stock visuals, and captions, no camera needed.

10 minBeginner

A faceless video carries a message with voiceover, visuals, and text instead of a person on camera. It is the fastest format to produce at scale because nothing depends on lighting, a set, or your willingness to film yourself. This guide assembles one from a script to a finished MP4 using free or cheap AI tools.

What you need

  • A script tool (ChatGPT, Claude, or your own writing)
  • A text-to-speech voice (ElevenLabs free tier works well)
  • A clip source for visuals (Pexels, Pixabay, or AI b-roll)
  • An editor that burns captions (CapCut is free)
  • About 30 minutes for your first one

Step 1: Write a tight 45-second script

A 9:16 Short should run 30 to 60 seconds, which is roughly 90 to 150 spoken words. Open the hook in the first sentence with a promise or a surprising fact, deliver three or four quick points, then close with a single takeaway. Ask your AI tool to write it in short sentences a narrator can read out loud.

script-prompt
Write a 130-word voiceover script for a 45-second faceless Short.
Topic: three money habits that quietly keep people broke.
Style: punchy, second person, one idea per sentence.
Start with a hook in the first 6 words. No intro, no sign-off.

Step 2: Generate the voiceover

Paste the script into a text-to-speech tool. In ElevenLabs, pick a clear voice, set Stability around 50 and Similarity high, then generate and download the MP3. Listen once for any odd pronunciations and fix them by spelling tricky words phonetically before re-rendering.

ElevenLabs - text to speech
Voice: Adam (deep, narration)
Stability ----o------ 50
Similarity -------o-- 82
[ paste script here ... 130 words ]
[ Generate ] [ Download .mp3 ]
Paste the script, pick a voice, then generate and download.

Step 3: Gather vertical visuals

Collect 6 to 10 short vertical clips, one for roughly every sentence. Pexels and Pixabay both offer free stock video you can filter to portrait orientation. Download clips that loosely match each line: a wallet for money, a clock for time, and so on. Literal matches are fine for faceless content.

Step 4: Assemble in CapCut and auto-caption

Open CapCut, create a 9:16 project, and drop in your voiceover first so the audio sets the timeline length. Lay each visual above the audio and trim it to match the matching sentence. Then use Captions, Auto captions to transcribe the voiceover, and style the text large and centered so it reads on a phone.

CapCut - 9:16 timeline
Canvas: 1080 x 1920 (9:16)
V2 [clip1][clip2][clip3][clip4][clip5]
V1 [====== captions (auto) ======]
A1 [========= voiceover.mp3 =========]
[ Export 1080p 30fps ]
Voiceover on the audio track, b-roll above, captions auto-generated.
Cut every 2 to 3 seconds
Faceless Shorts hold attention through motion. Swap the visual every couple of seconds so the screen never sits still, even if the audio keeps rolling on one point.

Step 5: Export and check it on a phone

Export at 1080x1920, 30fps. Before posting, watch it on an actual phone with the sound off, because most viewers scroll muted at first. If the captions carry the message without audio, you are ready to upload.

Result: a polished, faceless 45-second Short built without a camera, repeatable in under half an hour once the workflow is set.

Watch related tutorials

Tags
#faceless#shorts#voiceover#ai