OmniVoice Studio

aimedia

Self-hosted voice cloning, video dubbing, and audio processing suite. Clone a voice from a 3-second clip, dub video into 646 languages, and isolate vocals locally with no cloud dependency

#ai#voice-cloning#tts#dubbing#audio#local-ai#self-hosted#elevenlabs-alternative
Alternative to ElevenLabsDescript

Quick Start

docker compose up -d

Overview

OmniVoice Studio is a self-hosted audio and voice AI suite that covers voice cloning, video dubbing, vocal isolation, and real-time dictation from a single desktop application. The voice cloning engine needs only a three-second audio clip to build a speaker profile, and that profile can then be used to generate speech or dub video into one of 646 supported languages, all running on your own hardware.

The feature set is broad for a project this young. Vocal isolation uses Demucs to separate stems, speaker diarization uses Pyannote to identify who spoke when, and an AudioSeal watermarking layer embeds an inaudible signal into generated audio for provenance tracking. The desktop app runs on macOS, Windows, and Linux; a Docker path is also available for server deployments.

Hardware requirements are real. The project recommends 16GB of RAM and 8GB of VRAM for GPU-accelerated inference. CPU fallback exists and works on NVIDIA CUDA, Apple Silicon MPS, and AMD ROCm, but processing times will reflect whatever the hardware can deliver.

Two things to understand before you build on this. First, it is an active beta. The project launched in April 2026 and the codebase is moving fast, which means breaking changes happen. Second, the licence is FSL-1.1-ALv2. It is free for non-commercial use and converts to Apache 2.0 two years after each release, so commercial use is possible on older versions but restricted on current ones. For personal projects, research, or internal tooling, the licence is not an obstacle.

For anyone who has been paying ElevenLabs for voice cloning or Descript for video dubbing and wants to move those workloads off the cloud, this is the most capable self-hosted option in that space right now.

OmniVoice Studio: Pros & Cons

Pros (The Wins)Cons (The Friction)
Voice cloning:
3-second clip is enough;
no cloud upload required.
Active beta:
Launched April 2026; breaking
changes between versions likely.
646 languages:
Video dubbing uses the cloned
voice across language targets.
Hardware floor:
16GB RAM, 8GB VRAM recommended;
CPU fallback is slow.
Full audio pipeline:
Vocal isolation, diarization,
and AI watermarking included.
Licence limits:
FSL-1.1-ALv2 restricts commercial
use on recent releases.
Desktop + Docker:
macOS, Windows, Linux app
or server via Docker.
No hosted option:
Self-hosting is the only path;
no managed tier available.

Use Cases

Specific ways to use OmniVoice Studio for your workflow.

01
Clone a voice from a short audio clip without sending recordings to a third-party service
02
Dub a video into another language using the original speaker's voice
03
Strip vocals from a music track for remixing or transcription
04
Run real-time dictation locally on a machine with no internet requirement

Deployment Strategy

Recommended ways to host OmniVoice Studio in your own environment.

docker
self-hosted
desktop