Three weeks post-ban: local models in practice
After the Anthropic incident I spent three weeks testing alternatives. Not one-off tests — real agentic workloads on actual tasks: writing code, code review, generating tests, analysing documentation.
Models that went through the test: Qwen 2.5 Coder (72B), DeepSeek Coder V2 and their smaller variants. I tested them locally via Ollama and through OpenRouter as a cloud alternative.
OpenRouter: first hello for $0.11
OpenRouter works as a model access aggregator. Instead of a separate API key for each provider, you have one endpoint. For agentic workloads it's an elegant solution: the orchestrator calls OpenRouter, OpenRouter forwards to Qwen, DeepSeek, or whatever else.
The first test with DeepSeek via OpenRouter cost $0.11 for an entire work cycle that would have cost approximately $1.40 through Anthropic Claude Sonnet. The difference is substantial.
Model comparison: what practice revealed
| Model | Price (input / 1M tokens) | Price (output / 1M tokens) | Code quality | Latency |
|---|---|---|---|---|
| Claude Sonnet 3.7 | $3.00 | $15.00 | Excellent | Medium |
| Claude Opus 4 | $15.00 | $75.00 | Excellent | Higher |
| DeepSeek Coder V2 (OpenRouter) | $0.14 | $0.28 | Good | Low |
| Qwen 2.5 Coder 72B (local) | Hardware cost | Hardware cost | Good | Depends on HW |
Where local models fell short
On straightforward coding tasks — writing new functions, refactoring, generating tests — local models were comparable to Claude. On complex reasoning — architectural decisions, dependency analysis, security review — the gap was noticeable.
Another issue was context window size. Agentic workloads need a large context — full files, step history, tool outputs. Smaller local models struggled here.
Why we returned to Anthropic
After three weeks I returned to Claude — but differently configured. The ban was resolved (it was indeed a false positive); I added rate limiting on the orchestrator side and configured retry logic with exponential backoff.
Local models stayed on as backup and as the choice for routine tasks where cost matters. Claude remained for complex work. The orchestrator switches automatically based on task type.
This is probably the optimal architecture for most teams: a hybrid approach with the ability to switch.
Want to see how business processes can be automated? Book a consultation — we start where vibe-coding ends.
In the next episode
Day 6: An unexpected problem. The agent started signing code as co-author. And that has legal implications nobody anticipated.