AI RVC v2 Voice Modeling
An end to end voice conversion pipeline that trained and publicly released more than ten voice models.
This project is a full pipeline for retrieval based voice conversion, built in Python and PyTorch. It takes raw audio, prepares it, trains a voice model, and runs inference to convert one voice into another. I worked on it with a team of six, and together we trained and publicly released more than ten voice models.
How it works
The pipeline automates the whole process from audio to finished model. It segments and cleans raw recordings, removes background noise, and extracts pitch so the training data stays consistent. Training runs on the GPU, and each finished model is paired with a FAISS retrieval index that helps the converted voice keep the detail and character of the target speaker. Batch workflows let the team prepare large amounts of audio and train several models at once without repeating manual steps.
What I took from it
This was where I learned to build a machine learning system that other people actually use, from data preparation all the way to a released model. Handling audio at scale, keeping training reproducible, and tuning models for quality were the core of the work.
Hear it in action
The same clip before and after conversion with a trained voice model.