Agent Vibes: Finally, Your AI Agents Can Talk Back

Contents

I built Agent Vibes to solve a problem that started bothering me once I began spending most of my day talking to Claude Code. With WhisperFlow installed, I could speak naturally — dictate my prompts, describe the bug, explain what I wanted. That half of the loop felt right. The missing piece was obvious: the agent couldn’t talk back.

I built Agent Vibes so that my agents — after hearing me speak and brain dump — could finally talk back with voice, personality, and even a background music track.

The Core Idea

Agent Vibes injects text-to-speech hooks directly into your AI coding environment. When Claude Code (or Copilot, Codex, or Hermes) acknowledges a task, surfaces a finding, or completes work, the agent speaks the response aloud through your speakers. You don’t have to do anything differently in how you code. You install it once and it runs transparently in the background.

It’s open source, Apache 2.0, and built around free TTS providers — no API keys, no subscriptions, no cloud calls unless you want them.

Four Free Voice Providers, Zero API Keys

The voice quality story is better than you’d expect from free software:

Piper TTS is the default on Linux and WSL. It runs 904+ offline neural voices from Hugging Face’s rhasspy/piper-voices model hub — VITS-based models trained on high-quality speech data, 35+ languages, fully offline after the initial model download. Voices have friendly names (Ryan, Sarah, Joe) rather than technical model IDs.

Soprano gives the best audio quality if you have the hardware for it. It supports 20x CPU acceleration and 2000x GPU acceleration. Install it once with pip install soprano-tts and Agent Vibes will use it automatically.

macOS Say is the zero-setup option on Mac. Over 100 voices, high quality, built into the OS. If you’re on macOS, you get good TTS immediately with no additional installs.

Windows SAPI is the equivalent on Windows — built-in, zero setup, ready to use the moment you run the installer.

All four integrate through the same hook interface. You pick the one that fits your platform and hardware; Agent Vibes handles the rest.

914 Voices and the TUI

Choosing from 914 voices sounds like it would be a UI problem. The Agent Vibes terminal user interface handles it well.

Run npx agentvibes and you get a full interactive TUI with keyboard navigation. The Voices tab (press V) lets you browse all 914 voices, preview any of them with a spacebar press, select with Enter, and mark favorites with +/−. Letter keys jump alphabetically, which matters when you’re browsing hundreds of options.

The 914 voices break down as 904 Piper neural speaker variations and 10 hand-curated personality voices covering the most common tonal archetypes: Professional, Friendly, Authoritative, Warm, Energetic, Technical, Calm, Narrator, Conversational, Enthusiastic. The curated voices are the fastest starting point — they’re selected specifically because they work well for agent narration.

Beyond the Voices tab, the TUI has tabs for Setup (configure each supported LLM independently), Music (manage background tracks), and BMAD (assign voices to each BMAD agent). Everything configurable through the TUI is also accessible via MCP commands or slash commands if you prefer staying in the terminal without launching the full interface.

Audio Effects: Reverb

Agent Vibes has an optional audio effects layer built on sox. It’s optional — TTS works fine without it — but reverb is what gives different agents a distinct sonic character rather than just a different voice.

The TUI exposes five reverb levels (none through heavy) as a single dial. For agents that should sound authoritative or spacious, a light reverb makes the voice feel more deliberate. For agents that should sound close and direct, none or minimal works better.

ffmpeg handles background music mixing when it’s enabled. Both sox and ffmpeg install in one line on most Linux distros and are optional — Agent Vibes works without them.

Background Music: 18 Tracks Made with Suno AI

The Music tab in the TUI manages background music that plays quietly underneath the agent’s voice. I made all 18 included tracks myself using Suno AI — each one is a distinct genre, looped to work as a coding ambient backdrop:

Soft Flamenco · Dark Chill Step · Bossa Nova · Chillwave · Salsa · Bachata · Cumbia · Goa Trance · Arabic · Gnawa Ambient · Celtic Harp · Harpsichord · Hawaiian Slack Key Guitar · Japanese City Pop · Tabla Dream Pop · Drifting Down the Hall · Late Night Hip Hop Groove · Midnight Charleston Stomp

You can sample all of them from the TUI before committing. Volume control goes from 0.10 (barely perceptible) to 0.40 (genuinely present). Custom uploads are also supported — .mp3, .wav, .ogg, .m4a — for when none of the included tracks fit your mood.

For custom tracks, 30–90 seconds is the recommended duration for seamless looping, with a 300-second/50MB maximum.

Personalities and Sentiments

Personalities change both the voice and the speaking style simultaneously. A pirate personality means a pirate voice plus pirate speech patterns applied to whatever the agent is saying. 19 built-in personalities are included.

Sentiments are distinct from personalities: they keep your current voice and change only the speaking style. So you can keep Aria’s voice with a sarcastic sentiment applied, rather than switching to a full new persona. Both are set as slash commands:

/agent-vibes:personality pirate
/agent-vibes:sentiment sarcastic

The distinction matters when you’ve found a voice you like but want to vary the tone situationally rather than making a permanent switch.

Verbosity: Control What Gets Said

Not every agent output needs to be spoken. Agent Vibes has three verbosity levels:

LOW — acknowledgments and completions only. The agent says what it’s starting and what it finished. Minimal interruption.

MEDIUM — adds major decisions and key findings. Good for active development where you’re away from the screen frequently.

HIGH — full narration of reasoning, decisions, and findings. Useful when you’re in a long autonomous session and want continuous situational awareness.

Claude uses emoji markers in its responses (💭 🤔 ✓) that Agent Vibes detects and uses to decide what to speak based on the verbosity setting. No changes to how you prompt or how Claude responds are required. The detection runs at the hook level.

BMAD Party Mode: 12 Agents, 12 Voices

This is the feature that I think most clearly shows what voice does for multi-agent workflows.

The BMad Method lets you run multiple specialized agents simultaneously in party mode — the architect, the developer, the QA engineer, the product manager, and others all active and contributing to the same session. If you want to understand the full roster of BMad agents and the 50 advanced elicitation techniques they apply, I’ve written detailed guides on both. With Agent Vibes installed and BMAD integration configured, each agent has its own distinct voice.

When the Product Manager speaks, you hear Jessica. When the Developer responds, you hear Matthew. When the QA Engineer weighs in, you hear Burt. The voices aren’t just cosmetic — they’re how you track which agent is talking without reading the header on each response.

The BMAD tab in the Agent Vibes TUI lets you assign voice, reverb, and intro text to each of the 12 BMAD agents independently. The default mappings are sensible starting points, and you can override any of them. The integration works by injecting TTS markers into BMAD agent files during installation; Agent Vibes detects which agent is responding and routes audio accordingly. It’s compatible with both BMAD v4 and v6-alpha.

In party mode discussions, the effect is something close to being in a room where different people with different perspectives are actually speaking. It’s a significant UX change from reading labeled text blocks.

Per-LLM Audio Routing

Starting in v5.5, Agent Vibes gives each supported LLM its own independent audio profile.

Claude Code can have one voice, Copilot another, Codex a third, and Hermes a fourth. Each profile is independent: its own voice, reverb setting, background music, intro text, and audio destination. The MCP server auto-detects which LLM is calling and routes audio to the correct profile.

This matters most if you switch between AI tools during the day. You don’t have to reconfigure anything — the audio context switches automatically when the active LLM changes.

The AgentVibes Receiver: Remote Development Without Silent Agents

Most serious development happens on remote servers. If you haven’t already moved your development environment to a VPS, here’s why it’s worth considering. The machine running Claude Code is in a data center somewhere. It has no speakers. This is the problem the AgentVibes Receiver solves.

The Receiver routes audio from your voiceless server to your local laptop or desktop via SSH tunnel. The architecture:

Remote server (Claude Code + Agent Vibes)
  → Generates TTS audio
  → SSH tunnel (encrypted, port 14713)
  → Local device running AgentVibes Receiver
  → Plays through your speakers

On Windows, an additional queue-based watcher handles the Session 0 audio isolation problem — SSH connections on Windows land in a session that has no audio access. The watcher runs invisibly in your user session, auto-starts on login, monitors a queue directory, and plays each audio file as it arrives. Setup is a single PowerShell script run once from an Admin terminal.

The result: you can develop on a remote Linux server, have Agent Vibes generating TTS output there, and hear it through the speakers on your local machine. Tailscale VPN is supported as an alternative to direct SSH for teams that use it for remote access.

MCP Server: Natural Language Control

Agent Vibes ships with an MCP server that exposes the full configuration surface as natural language commands within Claude Code and Claude Desktop.

Instead of running /agent-vibes:switch Aria, you can just say “switch to Aria voice.” Instead of /agent-vibes:set-language spanish, you say “speak in Spanish.” The MCP server understands context and routes to the appropriate underlying command.

Available through MCP: voice switching, personality changes, language selection, verbosity control, reverb adjustment, speed control, mute/unmute, background music management, audio replay, voice listing, and more.

The tradeoff is tokens — MCP adds roughly 1500–2000 tokens to your context window per session. If you’re doing a long autonomous run and want zero overhead, the slash commands are the better choice. If you prefer natural language and don’t mind the token cost, MCP is more convenient.

Installation

One command:

npx agentvibes@latest install

This launches the TUI and walks you through selecting which LLM you’re attaching Agent Vibes to — Claude Code, Copilot, Codex, or Hermes. You pick your TTS provider, sample some voices, configure a verbosity level, and you’re done in a few minutes.

On Windows without Node.js:

.\setup-windows.ps1

On Android (via Termux from F-Droid, not Google Play):

pkg update && pkg upgrade && pkg install nodejs-lts
npx agentvibes install

For BMAD integration specifically, Agent Vibes detects which version of BMAD is installed and injects TTS markers into the appropriate agent files automatically during the standard install flow. Nothing extra required.

The Bigger Picture

WhisperFlow (or any speech-to-text tool) gives you voice input to your AI agents. Agent Vibes gives you voice output from them. Together, they make a fully voice-driven development session possible: speak your intent, hear the agent’s response, keep your eyes on the code.

I built Agent Vibes because working with AI agents all day long via text felt like it was missing something obvious. Voice is how humans exchange information at speed. It keeps your attention oriented without requiring you to read every response before continuing.

If you’ve been using Claude Code or any of the other supported tools, it’s worth a try. Install it, configure a voice you like, run a session in party mode with BMAD, and see what changes.

Source is at github.com/paulpreibisch/AgentVibes. The project is Apache 2.0 and actively maintained.