Intermediate9 min

Layering B-roll, Sound Design, and Pace

A cut that holds attention is rarely one continuous shot. It layers a spine of narration with b-roll that illustrates each beat, plus sound design that you feel more than hear. This lesson is about rhythm.

Step 1: Build the spine first

Lay your voiceover or main shots end to end as the spine. Everything else hangs off this. Get the spine timing right before you add a single piece of b-roll.

Step 2: Cut b-roll to the words

When the narration says a noun, show that noun. Generate short b-roll clips in Runway or Kling for each key idea, then place them over the spine so the visual changes exactly when the meaning changes.

Step 3: Add sound design under the cut

Whooshes on transitions, a soft room tone under quiet moments, and a low impact on a reveal make AI footage feel grounded. Sound covers the small unrealities in generated motion better than any visual fix.

The three-second rule
On social platforms, if a shot has not changed in three seconds, attention drops. Vary shot length and cut on motion to keep the eye moving.
CapCut - layered timeline
V1 [== voiceover spine =================]
V2 [b-roll][ b-roll ][ b-roll ]
A3 [============= music bed ==========]
A4 ^whoosh ^impact ^whoosh
Spine on track 1, b-roll on track 2, SFX on track 4.

Step 4: Ride the pace to the message

Fast cuts for energy, longer holds for emotion. A tutorial breathes; an ad sprints. Match the pace to the intent rather than cutting fast everywhere because it feels modern.

Result: a multi-layer cut where narration, b-roll, and sound design reinforce each other. This is where work starts to feel produced rather than generated.

Hands-on tasks