Voice Agent

STT Engine Comparison

faster-whisper (current) vs NVIDIA Parakeet TDT (stt-local)
Deploy Server

85.31.235.30 — Debian 12 (bookworm)

CPU
AMD EPYC 7543P
Cores
2 vCPU
RAM
7.8 GB
GPU
None
Disk
85 GB free
Python
3.11 / 3.13
Engine Overview
Currently in use

faster-whisper

OpenAI Whisper · CTranslate2 runtime
  • ModelWhisper small (244M params)
  • RuntimeCTranslate2 (C++)
  • HardwareCPU / CUDA GPU
  • Languages99 languages
  • WER (LibriSpeech)~4.2%
  • Speed (2-core CPU)~2–4x realtime
  • Memory~500 MB
  • Runs on server?Yes
stt-local candidate

Parakeet TDT 0.6B

NVIDIA NeMo · parakeet-mlx runtime
  • ModelParakeet TDT v3 (600M params)
  • RuntimeMLX (Apple only)
  • HardwareApple Silicon only
  • Languages25 European
  • WER (LibriSpeech)1.93%
  • Speed (M1 Ultra)~50x realtime
  • Memory~2.5 GB
  • Runs on server?No
Head-to-Head
Accuracy (lower WER = better)
Whisper
WER ~4.2%
Parakeet
WER 1.93%
Speed (higher = better)
Whisper
~3x RT (CPU)
Parakeet
~50x RT (M1)
Language Coverage
Whisper
99 languages
Parakeet
25 languages
Server Compatibility (Debian 12, x86, no GPU)
Whisper
Fully compatible
Parakeet
Gateway only (no inference)
Architecture

Current: faster-whisper — all-in-one on Debian server

Browser
Mic PCM
WebSocket
voice-ui
STT
faster-whisper
LLM
+ MCP tools
TTS
edge-tts
Browser
Speaker
All runs on single Debian server

stt-local: Gateway + Worker split

Browser
Mic PCM
Gateway
Debian server
Worker
Mac (required)
Model
Parakeet MLX
Requires Apple Silicon Mac as worker — cannot run inference on x86 CPU
Compatibility Matrix
Requirement Server has faster-whisper Parakeet (stt-local)
x86_64 CPU AMD EPYC 7543P ✓ Supported ✗ Needs Apple Silicon
NVIDIA GPU None ✓ Optional ✗ N/A (MLX only)
RAM ≥ 2 GB 7.8 GB ✓ ~500 MB ⚠ ~2.5 GB (if it could run)
Python ≥ 3.11 3.11 + 3.13 ✓ Needs ≥ 3.12
Linux / Debian Debian 12 ✓ Native ⚠ Gateway only
Offline / local ✓ Fully local ✓ Fully local
Options & Recommendations

2. Gateway + Mac worker

Run stt-local gateway on Debian, connect a Mac with Apple Silicon as remote worker via ngrok. Best accuracy (1.93% WER) but requires dedicated Mac hardware.
Needs Mac hardware Medium effort

3. Parakeet via ONNX on CPU

Export Parakeet to ONNX, run via onnxruntime on x86 CPU. Untested path — likely slower than faster-whisper on 2 cores without GPU.
High effort Risky

4. Add GPU to server

Upgrade to a VPS with NVIDIA GPU. Enables both faster-whisper large-v3 and Parakeet via NeMo/TensorRT. Best long-term path for quality + speed.
Hosting change All options open