1. Upload video
Drop your MP4 or other video file into the browser. Most common video formats are supported.
Video to stems
If your source is a video file, upload it here. The audio gets extracted and split into vocals and accompaniment so you can preview and download what you need.
Drop a song here — or tap to try it on your track
Free, in your browser. No signup. MP3, WAV, FLAC, M4A, OGG, or video.
Choose a fileDrop your MP4 or other video file into the browser. Most common video formats are supported.
Audio is pulled from the video container and split into vocals and accompaniment. The video itself is not stored.
Listen first, then download only what sounds good. Same quality expectations as audio file separation.
The tool reads the audio track from these video containers. Video resolution does not affect results — only the audio codec and bitrate matter.
| Container | Common audio codec | Typical source | Separation quality |
|---|---|---|---|
| MP4 (.mp4, .m4v) | AAC 128–256 kbps | Music videos, screen recordings | Good — AAC at 256 kbps is close to CD quality |
| WebM (.webm) | Opus 128–160 kbps | Browser recordings, web exports | Decent — Opus is efficient but lower bitrate hurts |
| MOV (.mov) | AAC or PCM | iPhone/iPad recordings, Final Cut exports | Varies — PCM is lossless and excellent; AAC depends on bitrate |
| AVI (.avi) | MP3 or PCM | Legacy files, older screen recorders | Depends entirely on the audio codec inside |
| MKV (.mkv) | AAC, FLAC, or Opus | Ripped media, OBS recordings | Good if FLAC; variable otherwise |
Quick checks before uploading.
Most video audio isn't created equal. Here's what to expect from each platform and when the video route makes sense.
YouTube music videos are typically 128 kbps Opus (WebM) or 192 kbps AAC (MP4); TikTok is 96 kbps AAC mono; Instagram Reels is 128 kbps mono. Lower bitrate means less detail for the AI — expect noticeably worse separation than from a Spotify-quality source. If you can find the same track on a music streaming service or as an MP3, use that instead.
YouTube's Terms of Service prohibit downloading most videos, but Creative Commons-licensed videos, videos you personally uploaded, and content you've purchased can be downloaded freely via yt-dlp or youtube-dl. For educational use (language learning, transcription), personal fair-use downloads are generally tolerated. Commercial use requires explicit licensing. When in doubt, link to the original video instead of rehosting.
Video-only sources are the right pick for: live concert footage where the audio isn't on Spotify, YouTube uploads with rare or niche content, language immersion clips, music in films or TV soundtracks not released separately, and custom footage where audio is bespoke (interviews, lectures, tutorials). For commercial releases available elsewhere, always prefer the audio-direct source.
Upload any song and hear the separated stems in seconds. Free, no account needed.
A standalone MP3 or WAV avoids the extra step of container extraction and often has higher bitrate audio than video files.
Audio in official releases is typically 256 kbps AAC or higher. Screen recordings and social media rips are usually much lower quality.
Extraction from the container is fast. The stem separation step takes the same time regardless of whether the source was video or audio.
Audio captured by phone microphones includes room reflections, crowd noise, and aggressive compression that degrades every stem.
Yes. The browser extracts the audio and splits it automatically.
MP4, WebM, MOV, and other common video formats. The tool extracts the audio track for processing.
Video resolution does not matter. The audio track quality inside the video is what determines separation quality.
Yes. Upload the MP4, extract audio, remove vocals, and use the accompaniment stem as a karaoke track.
Not necessarily. If the video contains AAC at 256 kbps or higher, the audio quality is comparable to a good MP3. PCM audio in MOV files is lossless.
No — the browser tool needs the video file itself, not a URL. Download the video first (yt-dlp is the standard open-source option), then upload the file. For YouTube-specific workflows, services like y2mate and clipto.com offer URL-to-MP3 conversion, but they operate against YouTube's ToS. For your own uploads or Creative Commons videos, yt-dlp is legal and the best quality.
Because audio quality inside videos varies dramatically. A 4K music video on YouTube might have 192 kbps AAC audio — near-CD quality. A 720p TikTok clip might have 96 kbps mono AAC. A phone recording at a concert is typically 64 kbps at best. The AI works with what's inside, and lossy compression at low bitrates destroys the harmonic detail the model needs to separate cleanly.
Partially. The AI separates 'vocals' from 'accompaniment' — it doesn't distinguish between the streamer's voice and the original song's vocals. You'll get all vocal content (streamer plus singer) in one stem. For true reaction-video use, you'd need source separation that handles two vocal signals, which requires more advanced models like MVSep's 'karaoke' mode or manual spectral editing in iZotope RX.
Remove vocals from any song and keep the instrumental.
Isolate vocals for remixes, mashups, and covers.
Create backing tracks for singing and practice.
Split a song into individual instrument stems.
Remove drums, bass, or piano from a track.
Full 5-stem separation on iOS, Android, and Mac.