Skip to content
Rift Studio
Back to blog
Deep Dive

Inside Stem Separation

Source separation — extracting individual instruments from a mixed audio file — has been one of the most requested features since the Rift Studio beta. Today we're pulling back the curtain on how we built it.

The Challenge

Most source separation tools run in the cloud. That means uploading audio, waiting, downloading stems — and trusting a server with your unreleased music. We wanted Rift Studio's implementation to be entirely offline.

Our Approach

We trained a lightweight neural network architecture optimized for CPU inference. The model runs in a separate thread, so your audio playback is never interrupted. Processing a 4-minute track takes roughly 20 seconds on a modern i7.

The model supports two modes: - 4-stem separation: Vocals, drums, bass, other - 2-stem separation: Vocals and accompaniment

Results are written as new clips directly onto your timeline, ready for editing and mixing.

Quality Tradeoffs

Offline models will always lag behind cloud-based models with billions of parameters. But for production workflows — extracting a vocal to add reverb, isolating drums for sampling, removing a bass line for a remix — the quality is more than sufficient.

We chose to optimize for speed and reliability over state-of-the-art accuracy. The stems are clean enough to mix with, and the processing is fast enough to use mid-session without breaking your flow.

What's Next

We're exploring real-time separation for monitoring purposes, and investigating fine-tuning the model on specific genres to improve accuracy for electronic music production.

Want to try the features mentioned in this post?