I spend most of my day working alongside AI coding tools. Codex CLI, Claude Code, various MCP integrations. After months of daily use, I've developed opinions about what matters.
This isn't a feature comparison. It's a framework for evaluating any AI coding tool.
Context Window Is King
The single most important factor: how much code can the tool see at once?
A 4K context window means the AI is working blind. It can see the function you're editing but not the interfaces it implements, the tests that cover it, or the other files that import it.
A 128K+ window changes everything. The tool can hold your entire module in memory. It understands relationships between files. It catches breaking changes before you make them.
When evaluating tools, I check context limits first. Everything else is secondary.
Speed Over Features
A tool that responds in 500ms gets used. A tool that takes 10 seconds doesn't.
I've tried feature-rich IDEs with AI baked in. They have autocomplete, chat, refactoring, documentation generation. And I don't use them because every action takes forever.
My current setup is fast by design. The CLI fires, gets a response, streams it back. No spinners, no waiting for "AI is thinking..."
Speed compounds. A fast tool gets used more. More usage means more learning. More learning means better results.
Escape Hatches Matter
AI tools mess up. They hallucinate functions that don't exist. They suggest deprecated patterns. They misunderstand requirements.
The best tools make recovery easy:
- Clear undo functionality
- Diff view before applying changes
- Ability to edit AI output before accepting
- Version control integration for rolling back
If a tool applies changes directly with no way to review or revert, I won't use it. The risk is too high.
CLI > IDE Plugin
This is personal preference, but: I prefer CLI tools over IDE plugins.
CLI tools are composable. I can pipe output, script interactions, integrate with other tools. They work in any environment—my local machine, SSH sessions, CI pipelines.
IDE plugins are locked to one editor. They might be more polished, but they're less flexible.
When I evaluate a new tool, I check: can I use this from a terminal? If not, it has to be dramatically better to justify the IDE lock-in.
Context Control
Some tools try to be smart about context. They automatically include "relevant" files. They analyze your repo structure and pull in what they think you need.
This is usually worse than explicit control.
I want to tell the tool exactly what to look at. Include these three files. Ignore that directory. Focus on this function.
Automatic context selection sounds good until it pulls in the wrong files and the AI starts suggesting changes that break things you didn't even know it was considering.
What I Actually Use
My current stack:
Codex CLI for most work. Fast, CLI-first, good context handling. Integrates with my task system through MCP.
Claude Code for complex refactoring. Better at understanding large architectural changes. Slower but more thorough.
MCP servers for domain-specific operations. GitHub integration, file system access, structured tool execution.
I switch between tools based on the task. Quick fixes go to Codex. Multi-file refactors go to Claude Code. Structured operations go through MCP.
Red Flags
Things that make me avoid a tool:
-
No offline mode. If the tool requires constant internet, it's useless on planes, in cafes with bad wifi, or when the API is down.
-
Unclear pricing. Usage-based pricing without clear estimates means surprise bills. I want to know what a month of heavy use costs.
-
Vendor lock-in. Tools that only work with one model provider lose value when better models appear. I want to swap backends.
-
Magic over control. "Just let the AI handle it" sounds nice until you need to understand what it did.
The Only Metric
Here's my real evaluation criterion: Does using this tool let me ship more, faster, with fewer bugs?
Features don't matter. Marketing doesn't matter. What matters is: when I finish a coding session, did I accomplish more than I would have without the tool?
That's the only test that counts.