Benchmarks for realtime voice agents.
A public lab notebook for the Duplex thesis: benchmark the best realtime voice stacks, then route each customer to the right provider for the use case.
Why Duplex benchmarks realtime voice stacks instead of betting on one provider
The market is moving too fast to hard-code one model. Duplex should compare OpenAI Realtime, Gemini Live, PersonaPlex, Pipecat pipelines, and emerging speech-native models against the actual use case: routed voice agents for communities and agent teams.
Benchmark the stack
Compare realtime voice candidates against the same receptionist routing script.
Publish the market map
Explain where Discord, e-commerce, DevOps, creators, and agent teams actually need voice.
Route to deployment
Turn research into provider recommendations and use-case-specific deployment paths.
Receptionist routing is the killer demo for voice agents
A voice agent should not just answer. It should route. The Duplex demo starts with a receptionist, transfers to specialists, returns home, and carries context across every handoff.
Read postMake the playground a living benchmark.
Each benchmark post should link back to the Playground, publish the test script, score the provider, and end with a clear deployment recommendation: managed OpenAI path, multimodal Gemini path, self-hosted PersonaPlex path, Pipecat adapter path, or another SOTA candidate.