OmniVoice Studio
Self-hosted voice cloning, video dubbing, and audio processing suite. Clone a voice from a 3-second clip, dub video into 646 languages, and isolate vocals locally with no cloud dependency
Quick Start
docker compose up -d Overview
OmniVoice Studio is a self-hosted audio and voice AI suite that covers voice cloning, video dubbing, vocal isolation, and real-time dictation from a single desktop application. The voice cloning engine needs only a three-second audio clip to build a speaker profile, and that profile can then be used to generate speech or dub video into one of 646 supported languages, all running on your own hardware.
The feature set is broad for a project this young. Vocal isolation uses Demucs to separate stems, speaker diarization uses Pyannote to identify who spoke when, and an AudioSeal watermarking layer embeds an inaudible signal into generated audio for provenance tracking. The desktop app runs on macOS, Windows, and Linux; a Docker path is also available for server deployments.
Hardware requirements are real. The project recommends 16GB of RAM and 8GB of VRAM for GPU-accelerated inference. CPU fallback exists and works on NVIDIA CUDA, Apple Silicon MPS, and AMD ROCm, but processing times will reflect whatever the hardware can deliver.
Two things to understand before you build on this. First, it is an active beta. The project launched in April 2026 and the codebase is moving fast, which means breaking changes happen. Second, the licence is FSL-1.1-ALv2. It is free for non-commercial use and converts to Apache 2.0 two years after each release, so commercial use is possible on older versions but restricted on current ones. For personal projects, research, or internal tooling, the licence is not an obstacle.
For anyone who has been paying ElevenLabs for voice cloning or Descript for video dubbing and wants to move those workloads off the cloud, this is the most capable self-hosted option in that space right now.
OmniVoice Studio: Pros & Cons
| Pros (The Wins) | Cons (The Friction) |
|---|---|
| Voice cloning: 3-second clip is enough; no cloud upload required. | Active beta: Launched April 2026; breaking changes between versions likely. |
| 646 languages: Video dubbing uses the cloned voice across language targets. | Hardware floor: 16GB RAM, 8GB VRAM recommended; CPU fallback is slow. |
| Full audio pipeline: Vocal isolation, diarization, and AI watermarking included. | Licence limits: FSL-1.1-ALv2 restricts commercial use on recent releases. |
| Desktop + Docker: macOS, Windows, Linux app or server via Docker. | No hosted option: Self-hosting is the only path; no managed tier available. |
Use Cases
Specific ways to use OmniVoice Studio for your workflow.
Deployment Strategy
Recommended ways to host OmniVoice Studio in your own environment.