Vocal & Instrumental Isolation

Basic Pitch is a modern neural network from Spotify’s Audio Intelligence Lab that converts melodic audio recordings into notes (MIDI format). Unlike outdated converters, this model can "hear" not only individual notes but also chords, along with the finest nuances of a performance. Official page: https://github.com/spotify/basic-pitch

Key Features

Polyphonic recognition: Basic Pitch handles complexity with ease. You can upload recordings of piano, guitar, or ensembles — the model recognizes multiple notes sounding simultaneously.
Nuance preservation (Pitch Bend): Most converters "quantize" sound to the nearest note, stripping away expression. Basic Pitch preserves pitch changes (pitch bends). If you sing with vibrato or perform bends on a guitar, these details will remain in the MIDI file.
Versatility: The model is trained on a massive dataset and works with most melodic instruments.
Speed and efficiency: It is a lightweight model that processes audio quickly without requiring powerful servers.

What instruments does the model work with?

Basic Pitch is an "instrument-agnostic" model. This means it handles different timbres equally well:
- Vocals: Hum a melody into a microphone, and the neural network will turn your voice into a synthesizer part.
- Strings: Acoustic and electric guitar, violin, cello.
- Keyboards: Pianos, organs, and synthesizers.
- Winds: Flute, saxophone, trumpet, and others.

Important: The model is designed for melodic instruments. It is not suitable for drums or percussion, as it focuses on pitch rather than rhythmic noise.

Basic Pitch (MIDI Extraction)

Key Features

What instruments does the model work with?

Site information

Company

Extra