Windows audio translation without virtual cables or drivers

A new open-source Windows tool called Voxis is translating whatever your computer is playing—videos, games, calls—into spoken translations without installing virtual audio cables or drivers. The app captures the system’s post-mix output in real time and streams it to a translation model, playing the result back while the original audio continues uninterrupted.
How it captures without cables or drivers
Most Windows audio tools rely on virtual cables like VB-CABLE or VoiceMeeter, or add bots to calls. Voxis avoids both by using Windows 10 version 2004’s ApplicationLoopback API, which enables a process-specific loopback capture that excludes the app’s own output. This “exclude target process tree” mode ensures the captured audio contains everything the user hears—minus Voxis’s translated speech—preventing feedback loops without extra patches.
To activate the loopback client, the app constructs an AUDIOCLIENT_ACTIVATION_PARAMS structure with its own process ID and requests the special device string “VAD\Process_Loopback.” The activation is asynchronous, handled through ActivateAudioInterfaceAsync with a custom COM completion handler. Here, a subtle requirement tripped up development: the handler must implement both IActivateAudioInterfaceCompletionHandler and the marker interface IAgileObject, or the call fails silently.
Streaming at 16 kHz mono, without stalls
The capture targets a 16 kHz mono WAVEFORMATEX stream, chosen to balance translation quality and real-time safety. WASAPI allows the app to initialize the loopback client with this exact format, avoiding resampling and latency spikes. To keep the ring buffer from overflowing if downstream components stutter, Voxis runs the capture loop in a high-priority thread and keeps buffer sizes conservative.
Voxis is open-source, with the capture engine written in Python using comtypes. The project’s documentation calls out the sharp edges—like the IAgileObject requirement—and acknowledges limits that aren’t under its control. For users tired of virtual cables and driver installs, it’s a step toward zero-setup audio processing on Windows.
Source: DEV Community. AI-assisted editorial synthesis — TechnoExpress.

