Dubbing basics
How to translate a video while keeping the background music
Updated June 2026 · 5 min read
The short answer
To translate a video without losing its background music, use a tool that separates speech from background audio before translating, then mixes the new dubbed speech back over the original music and ambience. WaveShift does this automatically, so the soundtrack, sound effects, and room tone stay intact.
Why most translation tools flatten your audio
Many video translation tools treat the audio as one inseparable track. When they generate dubbed speech, they either replace the whole soundtrack or talk over it, so the original music, sound effects, and ambient room tone get muffled or lost entirely.
For a talking-head clip that may be acceptable. For a music video, a vlog with a soundtrack, an ad, or a tutorial with ambient demonstration sound, losing the background audio changes how the video feels — and viewers notice immediately.
The fix: separate speech from background before you translate
The reliable way to keep the music is to split the audio into two stems first: the spoken voice, and everything else (music, effects, ambience). You translate and re-voice only the speech stem, then mix the new dubbed speech back over the untouched background stem.
Because the background stem is never regenerated, the music and sound effects come through exactly as they were in the original — only the spoken words change language.
How WaveShift preserves background audio, step by step
WaveShift runs this separate-translate-remix pipeline automatically. You do not configure stems or touch an audio editor:
- Add your video by uploading a file or pasting a YouTube, Bilibili, or direct video link.
- WaveShift separates the speech from the background music and sound effects.
- It translates the speech and clones each speaker's voice so the dubbed audio keeps the original speaker identity where possible.
- It mixes the translated speech back over the original background audio, so music and effects stay intact.
- Playback streams as it renders — you hear the first dubbed segment in minutes while the rest continues.
- If one line sounds off, you can edit that single subtitle line and regenerate only that line, without redoing the whole video.
When keeping the background matters most
Background preservation is the difference between a usable localized video and one you have to re-edit. It matters most for:
- Music videos and performances, where the track is the point.
- Vlogs and lifestyle content with a continuous soundtrack.
- Ads and trailers, where sound design carries the mood.
- Tutorials and product demos with ambient or on-screen sound.
Tips for the cleanest result
A few habits make the separated-and-remixed output sound natural:
- Start from a source where the speech is reasonably clear over the music — cleaner input separates better.
- Review one representative segment first, including a line with music underneath it, before processing a long video.
- Use single-line re-dubbing to fix any line that lands awkwardly instead of regenerating the whole file.
Frequently asked questions
Keep exploring
Try it on your own video
New accounts get 15 free minutes. Upload a file or paste a YouTube or Bilibili link and hear the first dubbed segment in minutes.
