AI Cover Vocals · TelkNet

Voice & vocalsGPU workload

AI Cover Vocals

Users making voice-color experiments, cover demos, or character voice trials

Converted vocals, original vocal references, and downloadable audio for demos and listening tests

5 credits/useAudio file inputMax 100 MBEstimated 180-900 seconds

Input

Clean vocals, cover material, or separated vocal stems through an RVC v2 conversion chain

Audio formats

mp3wavflacoggm4aaac

Output

Converted vocals, original vocal references, and downloadable audio for demos and listening tests

Best for

Users making voice-color experiments, cover demos, or character voice trials

Models, parameters, and sources

The AI cover pipeline combines separation, RVC voice conversion, and mixdown. The core inputs are a vocal file and target voice model.

RVC v2

RVC v2 converts the input vocal timbre into the selected character voice model.

RVC Project TelkNet AI-RVC adapter HuBERT

Pipeline: RVC v2 converts the separated source vocal into the selected voice model, then mixes with accompaniment.
Recommended TelkNet defaults: current pipeline, RoFormer separator, ensemble:vocal_rvc, hybrid F0, index_ratio 0.50.
Voice model: the selected voice_model_id chooses the RVC checkpoint and optional FAISS index.

RMVPE / hybrid F0

RMVPE is the robust vocal pitch estimator used by the RVC path and the hybrid F0 option.

RVC Project RMVPE

Pitch role: tracks source vocal F0 before conversion so the target timbre follows the melody.
Hybrid mode: combines RMVPE with alternate extractors for quality-first singing coverage.
Public evidence: the RMVPE Interspeech paper reports top RPA/RCA/OA on multiple polyphonic vocal pitch datasets.

RoFormer / Karaoke / Demucs separation

The AI cover pipeline separates vocals/accompaniment before conversion and mixdown.

python-audio-separator facebookresearch/demucs MVSep algorithms Mel-Band RoFormer BS-RoFormer HT Demucs

Primary separator: RoFormer vocal_rvc ensemble prepares clean vocals and accompaniment before RVC.
Harmony path: Karaoke lead/back separation can export lead vocals, backing vocals, and harmony-aware accompaniment.
Fallbacks: Demucs and UVR5 routes remain selectable for compatibility, but RoFormer is the quality-first default.

Model comparison

Model

Role

Why it is used

Output impact

ModelRVC v2

RVC Project TelkNet AI-RVC adapter HuBERT

RoleVoice conversion core

Why it is usedRVC v2 supplies the selected target timbre and index-assisted conversion.

Output impactConverted vocals

ModelRMVPE / hybrid F0

RVC Project RMVPE

RolePitch extraction

Why it is usedRMVPE is robust on polyphonic singing and is the quality-first pitch path.

Output impactMelody/F0 contour

ModelRoFormer vocal_rvc ensemble

python-audio-separator MVSep algorithms Mel-Band RoFormer BS-RoFormer

RoleVocal preparation

Why it is usedRoFormer vocal_rvc is the default separator for cleaner RVC input.

Output impactVocals and accompaniment stems

ModelDemucs / UVR5 fallback

facebookresearch/demucs HT Demucs python-audio-separator

RoleCompatibility separator

Why it is usedDemucs/UVR5 remain useful when a specific source behaves better outside RoFormer.

Output impactAlternative stems

Paper and benchmark notes

HuBERT: Self-Supervised Speech Representation Learning

IEEE/ACM TASLP - 2021

HuBERT provides self-supervised speech representations used by RVC-style voice conversion pipelines.

arXiv:2106.07447

RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music

Interspeech - 2023

RMVPE is an Interspeech 2023 robust vocal pitch estimator designed for polyphonic music and used for quality-first F0 extraction.

ISCA PDF

RVC Project

Official repository - current

The RVC project is the official technical source for the VITS/HuBERT retrieval-based voice conversion workflow.

RVC WebUI

Official / repositories

RVC Project TelkNet AI-RVC adapter python-audio-separator facebookresearch/demucs MVSep algorithms

Papers / technical notes

HuBERT RMVPE Mel-Band RoFormer BS-RoFormer HT Demucs

Parameter guide

Voice model: selects the target character weights and retrieval index.
Pitch shift: adjusts the source by semitones to match the target range.
Index ratio: controls how much the retrieval index affects voice color.
F0 method: selects RMVPE, CREPE, Harvest, or another pitch extraction path.
Separator: uses RoFormer, UVR5, or Demucs before conversion.
Conversion pipeline: defaults to the bundled AI-RVC cover chain; the official 1:1 route can still be selected manually.
Mix controls: adjust vocals, accompaniment, reverb, and original-vocal blend.

Use cases

Build cover demos and character-voice trials
Validate a voice direction before deeper production
Feed separated vocals into a downstream singing workflow

Workflow

1Prefer to upload already-separated vocal material
2Submit the conversion task with the target voice direction
3Listen to the result before deciding on further mixing or polishing

Pre-flight checklist

The cleaner the vocal, the more stable the conversion result
Separating vocals first usually beats feeding mixed audio directly into RVC
Output suits demos and direction validation — post-production is still recommended before public release

Login required to use this tool

Browsing tool info is always free - no account needed.