Local Models vs OpenRouter: 3 Weeks After the Ban

Note on pricing: The figures in the table below reflect prices as of Q2 2026. The AI model market moves fast — verify current rates with providers before making decisions.

Three weeks post-ban: local models in practice

After the Anthropic incident I spent three weeks testing alternatives. Not one-off tests — real agentic workloads on actual tasks: writing code, code review, generating tests, analysing documentation.

Models that went through the test: Qwen 2.5 Coder (72B), DeepSeek Coder V2 and their smaller variants. I tested them locally via Ollama and through OpenRouter as a cloud alternative.

OpenRouter: first hello for $0.11

OpenRouter works as a model access aggregator. Instead of a separate API key for each provider, you have one endpoint. For agentic workloads it's an elegant solution: the orchestrator calls OpenRouter, OpenRouter forwards to Qwen, DeepSeek, or whatever else.

The first test with DeepSeek via OpenRouter cost $0.11 for an entire work cycle that would have cost approximately $1.40 through Anthropic Claude Sonnet. The difference is substantial.

Model comparison: what practice revealed

Model	Price (input / 1M tokens)	Price (output / 1M tokens)	Code quality	Latency
Claude Sonnet 3.7	$3.00	$15.00	Excellent	Medium
Claude Opus 4	$15.00	$75.00	Excellent	Higher
DeepSeek Coder V2 (OpenRouter)	$0.14	$0.28	Good	Low
Qwen 2.5 Coder 72B (local)	Hardware cost	Hardware cost	Good	Depends on HW

Where local models fell short

On straightforward coding tasks — writing new functions, refactoring, generating tests — local models were comparable to Claude. On complex reasoning — architectural decisions, dependency analysis, security review — the gap was noticeable.

Another issue was context window size. Agentic workloads need a large context — full files, step history, tool outputs. Smaller local models struggled here.

Why we returned to Anthropic

After three weeks I returned to Claude — but differently configured. The ban was resolved (it was indeed a false positive); I added rate limiting on the orchestrator side and configured retry logic with exponential backoff.

Local models stayed on as backup and as the choice for routine tasks where cost matters. Claude remained for complex work. The orchestrator switches automatically based on task type.

This is probably the optimal architecture for most teams: a hybrid approach with the ability to switch.

Want to see how business processes can be automated? Book a consultation — we start where vibe-coding ends.

In the next episode

Day 6: An unexpected problem. The agent started signing code as co-author. And that has legal implications nobody anticipated.

Three weeks post-ban: local models in practice

OpenRouter: first hello for $0.11

Model comparison: what practice revealed

Where local models fell short

Why we returned to Anthropic

In the next episode

Cookie settings

Essential cookies

Analytics cookies

Marketing cookies

Are you sure you want to leave?

Book a free consultation