Audio & MusicIntermediate

How to Tune Stability and Similarity for Better-Sounding Voiceovers

Adjust the voice setting sliders to balance consistency against expressiveness and fix robotic or unstable output.

6 minIntermediate

Two settings shape almost everything about how an ElevenLabs voice sounds: Stability and Similarity. Getting them right is the difference between a flat, robotic read and a believable performance. This guide explains what each one does and how to dial them in.

What you need

Any ElevenLabs voice open in Text to Speech or Studio
A short test sentence with some emotion in it
A few minutes to compare generations

Step 1: Open the voice settings

Below the voice picker, expand the settings panel. You will see Stability and Similarity sliders, and on some models a Style and Speaker boost control as well.

ElevenLabs - Voice settings

Voice settings

----------------------------------------

Stability [----o-----------] 35%

Similarity [-------------o--] 80%

Style [--o-------------] 15%

[x] Speaker boost

[ Reset ]

Lower stability is more expressive; higher is more consistent.

Step 2: Understand stability

Stability controls variation between generations. Low stability lets the voice be more emotional and varied but can occasionally glitch or wander. High stability is steady and repeatable but can sound monotone. For narration, mid to high works; for lively characters, go lower.

Step 3: Understand similarity

Similarity controls how closely the output sticks to the original voice's character. Higher values track the source voice tightly, which is usually what you want for a clone. Pushing it too high can amplify artefacts from a noisy sample.

Step 4: A/B test the same line

Generate your test sentence, change one slider, and generate again. Changing only one variable at a time tells you which slider caused the difference. Keep the test sentence identical so the comparison is fair.

Sensible starting points

For clean narration, try stability around 50 and similarity around 75. For expressive characters, drop stability to 30 and lift style. Adjust from there in small steps.

Result: a voice that holds its character across a long script without drifting into either a robotic monotone or random glitches.