This is not an Audio File! Aborted Error when uploading the file Drag & Drop to Upload File Release to Upload File
MVSEP Logo
  • Home
  • News
  • Plans
  • Demo
  • FAQ
  • Create Account
  • Login

Music & Voice Separation

MVSEP performs separation of audio on voice and music parts
Target audio
Drag & Drop to Upload File
Reference audio
Drag & Drop to Upload File
Drag & Drop to Upload File
OR
Remote Upload
Batch Upload

0%

Unprocessed files in queue: 375. Currently processed with GPU: 11


September News

We've had a lot of changes since our last news update. The list is provided below.

1) We've added a high-quality model based on the BS Roformer architecture, which separates tracks into 6 stems: bass, drums, guitar, piano, vocals, and other. It is now the default model for first-time users. It is available under the name "BS Roformer SW (vocals, bass, drums, guitar, piano, other)".

The quality table below shows the SDR values from the Multisong dataset and from the leaderboards for piano and guitar:

vocals instrum bass drums guitar piano other
11.30 17.50 14.62 14.11 9.05 7.83 8.71

2) We have updated the models for the following algorithms:

  • MVSep Piano (SDR increased from 6.20 to 7.83)
  • MVSep Guitar (SDR increased from 7.51 to 9.05)
  • MVSep Bass (SDR increased from 14.07 to 14.87)
  • MVSep Drums (SDR increased from 13.78 to 14.35)
  • MVSep Strings (SDR increased from 3.84 to 5.41)
  • MVSep Wind (SDR increased from 7.22 to 9.82)

3) We've added a new model for vocals based on the BS Roformer architecture, which surpasses all available alternatives in separation quality (by the SDR metric). The vocal SDR metric increased from 11.31 to 11.89 on the Multisong dataset and from 13.56 to 14.58 on the Synth dataset. See the comparison with the previous best model in the table below.

Algorithm name Multisong dataset Synth dataset
SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental
BS Roformer (ver. 2024.08) 11.31 17.62 13.56 13.27
BS Roformer (ver. 2025.07) 11.89 18.20 14.58 14.28

4) We have added several new models for individual instruments:

  • MVSep Acoustic Guitar
  • MVSep Violin
  • MVSep Viola
  • MVSep Cello
  • MVSep Flute
  • MVSep Trumpet

5) All model ensembles have been updated to include the new and improved models.

The vocal ensemble Ensemble (vocals, instrum) has been updated:

  • It now has 3 versions: Best SDR, High Vocals Fullness, and High Instrum Fullness.
  • The Best SDR version achieves a SOTA (state-of-the-art) metric on the Multisong dataset: 11.93.
  • The high fullness versions maintain high SDR and Freq L1 scores compared to the high fullness versions of the MelBand Roformer models.

Quality metrics:

  • Best SDR: https://mvsep.com/quality_checker/entry/8479
  • High Vocals Fullness: https://mvsep.com/quality_checker/entry/8482
  • High Instrum Fullness: https://mvsep.com/quality_checker/entry/8483

The large ensembles have also been updated.

Ensemble (vocals, instrum, bass, drums, other):

  • The new quality scores compared to the previous version are shown in the table below and at the link. The algorithm includes the current best ensembles for drums, bass, and vocals. https://mvsep.com/quality_checker/entry/8504
Algorithm name Multisong dataset Synth dataset
SDR Bass SDR Drums SDR Other SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental
SDR average: 13.07 (v. 2024.12.28) 14.14 13.57 8.10 11.61 17.92 14.09 13.79
SDR average: 13.67 (v. 2025.06.30) 14.85 14.33 9.00 11.93 18.23 14.58 14.28

Ensemble All-In (vocals, bass, drums, piano, guitar, lead/back vocals, other):

  • Includes the same updates as the Ensemble (vocals, instrum, bass, drums, other) model.
  • Now uses 2 new karaoke models.
  • A new drumsep ensemble with the two best Mel Roformer models.
  • New guitar and piano models.
  • Additionally, strings and wind instruments have been added.

6) Four new Karaoke models have been added for lead/backing vocal separation:

  • A model from @gabox. Lead vocal SDR: 9.67.
  • A model based on merged weights from @gabox and @aufr33/@viperx. This model has a higher lead vocals SDR: 9.85.
  • A model based on the SCNet XL IHF architecture from @becruily. SDR: 9.53. Despite a lower SDR, it handles some tracks better where other models performed worse.
  • And finally, the latest model from @frazer and @becruily based on the BS Roformer architecture with a Lead vocal SDR of 10.11 - currently the highest quality model available.

All these models are available as options in MVSep MelBand Karaoke (lead/back vocals).

7) We've added a new text-to-audio generation algorithm: Stable Audio Open Gen. It is located in the "Experimental" section. The audio is generated in stereo at a 44.1 kHz sample rate with a duration of up to 47 seconds. The quality is quite high. Text prompts work best in English.

Examples of text prompts:

  • Generating sound effects: cats meow, lion roar, dog bark
  • Generating a sample: 128 BPM tech house drum loop
  • Generating specific instruments: A Coltrane-style jazz solo: fast, chaotic passages (200 BPM), with piercing saxophone screams and sharp dynamic changes

8) We've added the Parakeet model from NVIDIA for the speech recognition (ASR) task. It is designed for accurate and efficient transcription of spoken English into text. Unlike Whisper, this model only works with English, but it provides higher quality results for the language. It also generates quite accurate timestamps. The quality metric is a WER of 6.03 on the Huggingface Open ASR Leaderboard. It is listed right after Whisper in the model list on our site. The model's page on HuggingFace.

9) We've added the "Matchering (by sergree)" algorithm to the "Experimental" section. Matchering is a new tool for audio matching and mastering. It is based on a simple idea: you take TWO audio files and upload them to Matchering:

  • TARGET (the track you want to master and make it sound like the reference)
  • REFERENCE (another track, for example, a professional, well-known song that you want your target track to sound like)

The algorithm matches both of these tracks and provides you with a processed TARGET track that has the same RMS, frequency response, peak amplitude, and stereo width as the REFERENCE track. The algorithm is based on the code by @sergree.

10) We have added a site mirror: https://mirror.mvsep.com

This may be useful if you are experiencing slow file uploads or if the main site is inaccessible without a VPN.

11) Changes have been made to the site's interface and documentation:

  • Tags have been added to the model selection menu. They can help you navigate the large number of available models.
  • A 'Reprocess' button has been added next to each audio file. It allows you to apply another algorithm to a file without re-uploading it, or to process the output of one model with another.
  • An explanation of the Fullness/Bleedless concept has been added to the FAQ.
  • In the Quality Checker section, you can now sort models by various quality metrics.
❌ Hide article

MVSEP Logo

turbo@mvsep.com

Advanced features

Quality Checker

Algorithms

Full API Documentation

Company

Privacy Policy

Terms & Conditions

Refund Policy

Cookie Notice

Extra

Help us translate!

Help us promote!

0:00

0:00
0:00