Contents
I spend a lot of time evaluating what’s actually useful versus what’s impressive in a demo and disappointing in practice. The following eight tools have all cleared that bar in different ways. Some are things I use daily. Some are things I’m watching carefully. All of them are worth understanding if you work at the intersection of AI and building things.
Anthropic
The story of how Anthropic became the dominant AI company for developers is actually the story of Claude Code — and it starts with an internal accident.
Anthropic built Claude Code for their own engineers. It wasn’t a product; it was a tool their developers used in-house to move faster. The reaction from their internal teams was so strong that it became obvious they had to release it. What they shipped wasn’t what most people expected: not a polished GUI, not an IDE plugin, but a command-line tool. That decision turned out to be the whole game.
By putting Claude Code in the terminal, Anthropic made it universal. It doesn’t care what editor you’re in. VS Code, PhpStorm, Neovim, Zed, Cursor — none of it matters. You open a terminal, you have Claude Code. Developers who were locked into a specific IDE could use it. Developers who switched editors didn’t have to reinstall anything. It became ubiquitous precisely because it stayed at the lowest common denominator of developer tooling. Copilot lives inside VS Code. Claude Code lives everywhere.
The benchmark comparison against Copilot is stark. On CursorBench — the evaluation developers actually watch — Claude Code running Opus 4.7 scores 70% versus Copilot’s considerably lower marks. But raw scores miss what matters in practice: Claude Code holds context across an entire codebase, plans multi-file changes before making them, runs tests and iterates on failures, and understands why you’re building something, not just what the next token should be. Copilot finishes lines. Claude Code ships features.
Anthropic has since built out the Claude ecosystem around that terminal foundation. Claude Cowork brings the same agentic depth to knowledge workers — analysts, researchers, operations teams — working in documents rather than codebases. Claude Desktop gives the agent direct access to your local files when you grant it, so it can read, edit, and create files without you copying and pasting context. These three tools — Code, Cowork, and Desktop — form a coherent stack where the same underlying model capability expresses itself appropriately depending on whether you’re a developer, an analyst, or a general knowledge worker.
If you’re a developer looking to extract the maximum leverage from Claude Code specifically, the combination that I’ve found most powerful is pairing it with the BMad Method — a structured multi-agent framework that assigns different Claude agents to different roles (architect, developer, QA, product manager) and coordinates them through a defined workflow. Claude Code handles the execution; BMad handles the orchestration. The result is closer to having a full engineering team than a single AI assistant. I’ve written a full breakdown of the BMad Method, including every agent in the roster and a complete reference for all 50 advanced elicitation techniques.
The model family underpinning all of this — Opus 4.7, Sonnet 4.6, Haiku 4.5 — is what makes the product case possible. Opus 4.7 sits at the top of every major developer coding benchmark, with self-verification capability and a 3× improvement in vision resolution over its predecessor. Sonnet 4.6 covers the bulk of day-to-day work at a lower cost. Haiku 4.5 handles the high-volume sub-agent tasks that don’t need deep reasoning. The three tiers aren’t marketing; they’re how the economics of running parallel agent sessions actually work at scale.
Claude Code
This is my sole development environment. I don’t use Copilot. I don’t use Cursor. Claude Code on the Max20 plan ($200/month) is the only coding tool I need, and the economics make sense when you account for what it replaces.
Claude Code is a terminal-native agent — not an IDE plugin, not a tab-completion tool. You give it a goal. It reads your codebase, makes a plan, edits files, runs tests, monitors CI, and iterates on failures. It can read GitHub issues, write the implementation, and submit a PR without you switching contexts. The Max20 plan unlocks multiple parallel agent sessions simultaneously, which changes the shape of a workday: you can have one agent working through a refactor while another handles a new feature, both reporting back to you rather than you babysitting a single thread.
What makes Max20 worth it over Pro ($20) or Max5 ($100) specifically is the parallel sessions. A complex project that would take a full day sequentially can often be completed in a few hours with agents working different parts of the codebase at the same time. For anyone using this professionally rather than occasionally, the productivity return is clear. The developer satisfaction data backs this up — 46% of developers surveyed call it their most loved tool — but honestly the numbers don’t capture the qualitative shift in how you think about what’s possible in a session.
Claude Cowork
This one is newer and less talked about. Cowork is Anthropic’s attempt to bring the Claude Code experience to knowledge workers — researchers, analysts, ops teams, legal, finance — people who work in documents rather than codebases.
The pitch: you grant Cowork access to folders on your machine, give it a goal, and it delivers a finished artifact. Not a summary of what you should do. The actual deliverable — a slide deck, a spreadsheet analysis, a drafted document. It spins up sub-agents for complex tasks, reads and writes real files, connects to Google Drive, Gmail, FactSet, DocuSign. It can run on a schedule: pull metrics every Monday, generate a weekly digest, and send it.
This is Anthropic’s direct competition with Microsoft Copilot for M365, and the interesting thing is the agentic depth is meaningfully different. Copilot surfaces suggestions; Cowork executes tasks. Early access is on paid plans, desktop app for macOS and Windows. I’ve been watching the enterprise connector rollout closely — the FactSet integration specifically is interesting for anyone doing financial analysis workflows.
n8n
If you’re still paying per-task pricing for automation, you should spend an afternoon with n8n.
n8n is a workflow automation platform that you can self-host for free and run on infrastructure that costs $5–10/month. A workflow that would cost $500+/month on Zapier at scale runs essentially free. The catch is that it’s not quite OSI open source — it’s “fair-code,” meaning commercial use is restricted — but for internal automation that affects 99% of teams, it functions identically to open source.
The reason n8n matters specifically in 2026 is the AI integration depth. It has native LangChain support, which means you can build full agentic workflows visually: LLM reasoning node, tool use, memory, retrieval, human-in-the-loop approval, all in the same canvas. It supports every major LLM provider — including local models via Ollama, which matters for regulated industries that can’t send data to OpenAI. And it speaks MCP (Model Context Protocol), which means it can expose your workflows as tools that Claude Code can call directly.
That last point is worth sitting with. There’s an official n8n MCP server, which means you can open Claude Code in your terminal, describe the automation workflow you want to build, and have it generate and wire up the n8n workflow for you. You get the reasoning power of Claude Code combined with n8n’s 500+ integrations — it collapses what would normally be hours of drag-and-drop configuration into a single conversation.
The $2.5B valuation from last year’s Series C is justified by what’s actually shipping. The Draft/Published workflow split in n8n 2.0 is a quiet but real improvement for teams running production automation — you can edit without touching live workflows. Worth evaluating if you haven’t looked at it recently.
Hermes Agent (Nous Research)
Something significant happened in April 2026: Hermes Agent hit number one on OpenRouter, overtaking OpenClaw with 271 billion tokens versus OpenClaw’s 245 billion. For context, OpenRouter is where hundreds of thousands of developers route their AI tools every day — it’s the scoreboard that the AI tooling community actually watches. Hermes only went active in March 2026. It took a few weeks to walk past a project that had a significant head start.
Hermes Agent is built on Nous Research’s DeepHermes 3 models, but the agent itself is the more interesting thing than the underlying model. This is a persistent, self-improving AI agent: it remembers things across sessions, evolves its own behaviour based on past interactions, and builds up a model of how you work and what you need over time. It’s not a stateless chatbot you start fresh each session — it’s closer to a working assistant that gets more useful the longer you use it.
The interface options matter here. Hermes runs on the command line (which is how I use it most), but it also has a desktop companion application and a web UI via Hermes Workspace — an open-source dashboard on GitHub with swarm mode for coordinating multiple agents simultaneously. You’re not locked into the terminal if you prefer something visual, and swarm mode in particular is worth exploring if you want to run parallel agent tasks with a proper overview of what each one is doing.
The key feature that both Hermes and OpenClaw share — and the thing that makes them genuinely different from standard AI tools — is native integration with messaging platforms. Telegram, WhatsApp, Discord, Slack, iMessage: whichever you already use becomes the interface for your local AI agent. You message it the way you’d message a colleague, from your phone, from wherever you are. For anyone who’s tried to build serious automation workflows, the ability to trigger and interact with an agent through a chat app you already have open all day is a meaningful quality-of-life difference.
The community and team are also worth mentioning. Nous Research co-founder Technium is active on X, shares community projects, responds to people building with the tools, and communicates openly about what’s coming. Nearly 1,000 contributors helped build Hermes to its current 140,000 GitHub stars. That kind of community momentum is hard to manufacture — it reflects genuine utility.
The trajectory from OpenClaw’s decline is instructive too. OpenClaw accumulated massive early traction, then shipped updates at high frequency that introduced instability. The people getting the most reliable results from OpenClaw today are often running older versions and not updating. Hermes has taken the opposite approach: slower, more deliberate releases with a clear focus on stability and security. It’s still in early beta — not yet at 1.0 — but it just works in a way that matters for anyone relying on it for business workflows.
OpenClaw
OpenClaw went from zero to 60,000 GitHub stars in 72 hours when it went viral in late 2025. At its peak it was registering nearly 450 billion tokens on OpenRouter. That kind of early adoption doesn’t happen without something genuinely compelling at the core — and OpenClaw’s core concept is still good.
The idea: OpenClaw runs locally on your machine, connects to whatever LLM backend you bring (Claude, GPT-4, DeepSeek — your API key), and uses messaging platforms as its primary interface. WhatsApp, Telegram, Slack, Discord, iMessage — whichever you already use becomes the control surface for your local AI agent. This is the main selling point and it’s a real one. Instead of opening a separate app to talk to an AI, the agent is reachable through the chat platforms you already have open all day. You can trigger workflows, ask questions, get proactive alerts, all from your phone via apps you’re already in.
The Heartbeat feature extends this further — the agent can act proactively on a schedule, without waiting for your prompt. Monitoring, scheduled tasks, alerts. The skills system gives it modular capabilities: 100+ built-in, 13,700+ on the ClawHub marketplace.
The honest current picture is that OpenClaw has become less reliable as it’s shipped more updates. The token usage on OpenRouter tells the story — down significantly from that 450B peak. The users getting the most consistent results right now tend to be running older versions rather than updating to the latest. This doesn’t mean the architecture is wrong; it means the project is in a turbulent growth phase where stability fell behind feature velocity. Whether it stabilises is genuinely uncertain.
The ClawHub security concern is also real and worth stating plainly: an audit found 820+ skills flagged as malicious or suspicious — credential harvesters and cryptominers. There’s no mandatory code review before publication. Install only from sources you can verify, and audit anything from unknown publishers before giving it access to your machine.
ElevenLabs
I’ve used ElevenLabs extensively — it powers the voice layer in some of my own projects, including AgentVibes. The quality gap between ElevenLabs and everything else in English naturalness is real and audible.
The v3 model (launched June 2025) is a meaningful step forward, specifically because of inline audio tags: [laughs], [whispers], [sighs], [strong British accent], [sings]. These move emotional direction from post-production into the script itself. For voice-driven workflows — agents, narration, interactive content — this is the kind of control that makes a difference. The jump from 28 to 70+ languages in v3 is also worth noting for anyone building multilingual products.
The Conversational AI platform is what I watch most closely. It handles turn-taking, interruption, multi-LLM routing, and session management — the full stack for building a real-time voice agent without assembling the pieces yourself. SOC 2 Type II, HIPAA, PCI DSS Level 1 compliance means it’s viable for regulated industries, not just consumer apps.
Pricing is higher than competitors, which is the honest trade-off. At the API tiers ($99/month Pro, $330/month Scale) you’re paying for the combination of quality, breadth of product, and ecosystem maturity. For prototyping there are cheaper alternatives; for production voice products that need to sound good, the premium is usually worth it.
HeyGen
HeyGen is the tool I point people to when they ask about AI video. It’s moved past “interesting demo” territory into something that’s genuinely changing content economics for marketing, training, and localization work.
Avatar V is the headline: a studio-quality AI avatar from a 15-second recording, with identity consistency across multi-angle footage, varied lighting, and long-form content. Not a deepfake effect — a model of your specific dental structure, skin texture, talking rhythm, and habitual expressions. The LiveAvatar variant handles real-time interactive use cases.
The more commercially interesting feature is video translation: localize any video into 175+ languages and dialects with lip-sync and cloned voice that preserves the speaker’s vocal identity. The Precision Mode engine handles occlusion, multi-speaker content, and context-aware translation. The output is a single shareable URL with a viewer-side language switcher — no managing per-language video files. For teams distributing training content or marketing material globally, the economics here are compelling: translate once from the source recording, not per-language.
Video Agent is what I’m watching for the next evolution. You give it a prompt, it writes a script, generates B-roll (via Sora 2 and Google Veo 3.1 integrations), selects an avatar, assembles the video, and delivers it in under four minutes. That’s not a production workflow with AI assistance; that’s a production workflow where AI is the producer. The quality ceiling isn’t there yet for everything, but for certain use cases it already crosses the “good enough” threshold.
Worth noting: HeyGen integrates ElevenLabs voices for TTS, which means the two platforms are complementary rather than competing. A HeyGen avatar speaking with a cloned ElevenLabs voice is currently one of the better combinations available for realistic AI-generated video.
The pattern I keep noticing across all of these: the most interesting tools are the ones that don’t just assist with tasks but can execute them end-to-end. Claude Code, Cowork, OpenClaw, HeyGen’s Video Agent — these are qualitatively different from tools that surface suggestions. The shift from “AI helps you do the work” to “AI does the work and you review it” is happening faster than most people expected, and the tools on this list are where that’s becoming real.
If you’re building something in this space or want to think through how any of these fit into your workflow, get in touch.
Comments
Loading comments…
Leave a comment