AI is a Great Servant, Terrible Master: MCP and Human Control

Why paranoia is the right answer

After seven days of experimentation I'm certain of one thing: the right attitude toward agentic AI is not trust, but managed distrust. The agent will do what you allow it to — and if you allow too much, it will do exactly that.

Guardrails aren't a productivity constraint. They're the conditions under which agentic development is safely usable. Without them you're experimenting — not deploying.

MCP as a control layer

Model Context Protocol offers a natural place for guardrails: MCP servers. Instead of the agent calling tools directly, it calls them through an MCP server that implements control logic.

Concrete examples of MCP guardrails:

Filesystem MCP — the agent can only read and write to permitted directories. Attempts to access outside the perimeter are logged and rejected.
Git MCP — commits are permitted, but pushing to main requires human approval. The agent commits to a feature branch; the human merges.
Database MCP — the agent has read-only access, or access to an isolated development database. No access to production data.
Terminal MCP — a whitelist of permitted commands. The agent can run tests, not delete files.

Hooks as safety valves

Git hooks are the second line of defence. While MCP guardrails control what the agent can call, hooks control what makes it into the repository.

Pre-commit hook

Checks every commit before it's confirmed. In our configuration it verifies:

No secrets or API keys in the code (regex against common patterns)
No direct changes to config files outside permitted paths
Minimum commit message format requirements met

Pre-push hook

Requires interactive confirmation on push to protected branches. The agent cannot skip this check — the hook runs client-side and is not configurable from the system prompt.

Limiting tools: less is more

One of the most effective guardrails is the simplest: give the agent fewer tools than you think it needs. An agent with access to the terminal, filesystem, database and external APIs is powerful — and unpredictable. An agent with access only to the filesystem and git is slower, but safer.

Add tools incrementally based on demonstrated need, not preventively. Every tool you add expands the surface area for unexpected behaviour.

Human review at end of cycle

Technical guardrails aren't enough. At the end of every agentic work cycle there must be a human review — not as a formality, but as a genuine review.

Minimum checklist for reviewing agentic output:

Does the code do what it was asked to do?
Are the tests meaningful, or merely increasing coverage?
Are existing conventions preserved?
Are there visible security issues (SQL injection, XSS, IDOR)?
Does the code respect architectural boundaries?

Review doesn't need to be exhaustive for every commit. But for every feature or logical unit it should happen.

Key conclusion: Guardrails are an investment, not overhead. One hour spent setting up the right controls saves days of cleanup after an uncontrolled agent.

Want to see how business processes can be automated? Book a consultation — we start where vibe-coding ends.

Guardrails: How to Keep an AI Agent Under Control

Why paranoia is the right answer

MCP as a control layer

Hooks as safety valves

Pre-commit hook

Pre-push hook

Limiting tools: less is more

Human review at end of cycle

Related in the series

Why paranoia is the right answer

MCP as a control layer

Hooks as safety valves

Pre-commit hook

Pre-push hook

Limiting tools: less is more

Human review at end of cycle

Related in the series

Cookie settings

Essential cookies

Analytics cookies

Marketing cookies

Are you sure you want to leave?

Book a free consultation