AI RVC v2 Voice Modeling

This project is a full pipeline for retrieval based voice conversion, built in Python and PyTorch. It takes raw audio, prepares it, trains a voice model, and runs inference to convert one voice into another. I worked on it with a team of six, and together we trained and publicly released more than ten voice models.

How it works

The pipeline automates the whole process from audio to finished model. It segments and cleans raw recordings, removes background noise, and extracts pitch so the training data stays consistent. Training runs on the GPU, and each finished model is paired with a FAISS retrieval index that helps the converted voice keep the detail and character of the target speaker. Batch workflows let the team prepare large amounts of audio and train several models at once without repeating manual steps.

What I took from it

This was where I learned to build a machine learning system that other people actually use, from data preparation all the way to a released model. Handling audio at scale, keeping training reproducible, and tuning models for quality were the core of the work.

How it works

What I took from it

Hear it in action