How to Make a Faceless YouTube Short with AI
Build a vertical, faceless Short end to end with an AI script, a synthetic voiceover, stock visuals, and captions, no camera needed.
A faceless video carries a message with voiceover, visuals, and text instead of a person on camera. It is the fastest format to produce at scale because nothing depends on lighting, a set, or your willingness to film yourself. This guide assembles one from a script to a finished MP4 using free or cheap AI tools.
What you need
- A script tool (ChatGPT, Claude, or your own writing)
- A text-to-speech voice (ElevenLabs free tier works well)
- A clip source for visuals (Pexels, Pixabay, or AI b-roll)
- An editor that burns captions (CapCut is free)
- About 30 minutes for your first one
Step 1: Write a tight 45-second script
A 9:16 Short should run 30 to 60 seconds, which is roughly 90 to 150 spoken words. Open the hook in the first sentence with a promise or a surprising fact, deliver three or four quick points, then close with a single takeaway. Ask your AI tool to write it in short sentences a narrator can read out loud.
Write a 130-word voiceover script for a 45-second faceless Short.
Topic: three money habits that quietly keep people broke.
Style: punchy, second person, one idea per sentence.
Start with a hook in the first 6 words. No intro, no sign-off.Step 2: Generate the voiceover
Paste the script into a text-to-speech tool. In ElevenLabs, pick a clear voice, set Stability around 50 and Similarity high, then generate and download the MP3. Listen once for any odd pronunciations and fix them by spelling tricky words phonetically before re-rendering.
Step 3: Gather vertical visuals
Collect 6 to 10 short vertical clips, one for roughly every sentence. Pexels and Pixabay both offer free stock video you can filter to portrait orientation. Download clips that loosely match each line: a wallet for money, a clock for time, and so on. Literal matches are fine for faceless content.
Step 4: Assemble in CapCut and auto-caption
Open CapCut, create a 9:16 project, and drop in your voiceover first so the audio sets the timeline length. Lay each visual above the audio and trim it to match the matching sentence. Then use Captions, Auto captions to transcribe the voiceover, and style the text large and centered so it reads on a phone.
Step 5: Export and check it on a phone
Export at 1080x1920, 30fps. Before posting, watch it on an actual phone with the sound off, because most viewers scroll muted at first. If the captions carry the message without audio, you are ready to upload.
Result: a polished, faceless 45-second Short built without a camera, repeatable in under half an hour once the workflow is set.
Watch related tutorials
20:30
15:18
14:02
13:27
16:20
17:55