GeminiBeginner

How to Generate a Short Video with Veo in Gemini

Use the Veo video model inside the Gemini app to turn a text prompt into a short clip, then refine it.

7 minBeginner

Veo is Google's text-to-video model, available to Gemini subscribers inside the chat app. You describe a scene and it generates a short clip with motion and sound. This guide writes a prompt that produces a usable result and shows how to iterate when the first take is off.

What you need

A Gemini plan that includes Veo video generation
A clear visual idea: subject, setting, and motion
A few minutes per clip for rendering

Step 1: Open the video tool

In the Gemini app, open the tools menu near the prompt box and pick the video option (often labelled Video or Veo). The interface switches to a mode built for generating clips rather than text replies.

Gemini - tools menu

Tools

------------------

Deep Research

Canvas

> Video (Veo)

Image (Imagen)

Select the video tool to switch into Veo mode.

Step 2: Write a specific prompt

Good video prompts name the subject, the action, the camera movement, and the style. Vague prompts produce generic footage. Treat it like a one-line shot description a director would hand a camera operator.

veo-prompt.txt

A golden retriever puppy running across a sunny beach at sunrise,
slow motion, camera tracking alongside, warm cinematic lighting,
soft waves in the background.

Step 3: Review, then refine

When the clip finishes, watch it and decide what to change. Rather than rewriting from scratch, adjust one element at a time, such as the camera angle or time of day, so you can tell what each change does.

Gemini - refine the clip

You

Same scene but at night with a full moon and cooler blue tones.

Agent

Generating a new version with night lighting...

Iterate by changing a single attribute.

Describe motion, not just the scene

The thing that separates video prompts from image prompts is movement. Spell out what moves and how the camera behaves (pan, zoom, track) to avoid a clip that looks like a barely-animated still.

Note

Clips are short by design, often around 8 seconds. For a longer sequence, generate several clips and stitch them in a video editor rather than expecting one long render.

Result

You get a short, downloadable clip from a text description, and a fast loop for refining it. Generating two or three variations and picking the best one is usually quicker than perfecting a single prompt.