Skip to content
WaveShift

Dubbing basics

How to translate a video while keeping the background music

Updated June 2026 · 5 min read

The short answer

To translate a video without losing its background music, use a tool that separates speech from background audio before translating, then mixes the new dubbed speech back over the original music and ambience. WaveShift does this automatically, so the soundtrack, sound effects, and room tone stay intact.

Why most translation tools flatten your audio

Many video translation tools treat the audio as one inseparable track. When they generate dubbed speech, they either replace the whole soundtrack or talk over it, so the original music, sound effects, and ambient room tone get muffled or lost entirely.

For a talking-head clip that may be acceptable. For a music video, a vlog with a soundtrack, an ad, or a tutorial with ambient demonstration sound, losing the background audio changes how the video feels — and viewers notice immediately.

The fix: separate speech from background before you translate

The reliable way to keep the music is to split the audio into two stems first: the spoken voice, and everything else (music, effects, ambience). You translate and re-voice only the speech stem, then mix the new dubbed speech back over the untouched background stem.

Because the background stem is never regenerated, the music and sound effects come through exactly as they were in the original — only the spoken words change language.

How WaveShift preserves background audio, step by step

WaveShift runs this separate-translate-remix pipeline automatically. You do not configure stems or touch an audio editor:

  • Add your video by uploading a file or pasting a YouTube, Bilibili, or direct video link.
  • WaveShift separates the speech from the background music and sound effects.
  • It translates the speech and clones each speaker's voice so the dubbed audio keeps the original speaker identity where possible.
  • It mixes the translated speech back over the original background audio, so music and effects stay intact.
  • Playback streams as it renders — you hear the first dubbed segment in minutes while the rest continues.
  • If one line sounds off, you can edit that single subtitle line and regenerate only that line, without redoing the whole video.

When keeping the background matters most

Background preservation is the difference between a usable localized video and one you have to re-edit. It matters most for:

  • Music videos and performances, where the track is the point.
  • Vlogs and lifestyle content with a continuous soundtrack.
  • Ads and trailers, where sound design carries the mood.
  • Tutorials and product demos with ambient or on-screen sound.

Tips for the cleanest result

A few habits make the separated-and-remixed output sound natural:

  • Start from a source where the speech is reasonably clear over the music — cleaner input separates better.
  • Review one representative segment first, including a line with music underneath it, before processing a long video.
  • Use single-line re-dubbing to fix any line that lands awkwardly instead of regenerating the whole file.

Frequently asked questions

No. WaveShift separates speech from the background audio, translates only the speech, then mixes the dubbed speech back over the original music and sound effects, so the soundtrack stays intact.
Where possible, yes. WaveShift clones each speaker's voice so the translated speech keeps the original speaker identity rather than replacing it with a generic voice.
You can upload a video file or paste a YouTube, Bilibili, or direct video link. WaveShift ingests the video and runs the full translate-and-dub pipeline.
Yes. WaveShift supports single-line re-dubbing, so you can edit one subtitle line and regenerate only that line.
Minutes are calculated from the source video duration, regardless of the target language. A 10-minute video uses 10 minutes.

Keep exploring

Try it on your own video

New accounts get 15 free minutes. Upload a file or paste a YouTube or Bilibili link and hear the first dubbed segment in minutes.