Why paranoia is the right answer
After seven days of experimentation I'm certain of one thing: the right attitude toward agentic AI is not trust, but managed distrust. The agent will do what you allow it to — and if you allow too much, it will do exactly that.
Guardrails aren't a productivity constraint. They're the conditions under which agentic development is safely usable. Without them you're experimenting — not deploying.
MCP as a control layer
Model Context Protocol offers a natural place for guardrails: MCP servers. Instead of the agent calling tools directly, it calls them through an MCP server that implements control logic.
Concrete examples of MCP guardrails:
- Filesystem MCP — the agent can only read and write to permitted directories. Attempts to access outside the perimeter are logged and rejected.
- Git MCP — commits are permitted, but pushing to main requires human approval. The agent commits to a feature branch; the human merges.
- Database MCP — the agent has read-only access, or access to an isolated development database. No access to production data.
- Terminal MCP — a whitelist of permitted commands. The agent can run tests, not delete files.
Hooks as safety valves
Git hooks are the second line of defence. While MCP guardrails control what the agent can call, hooks control what makes it into the repository.
Pre-commit hook
Checks every commit before it's confirmed. In our configuration it verifies:
- No secrets or API keys in the code (regex against common patterns)
- No direct changes to config files outside permitted paths
- Minimum commit message format requirements met
Pre-push hook
Requires interactive confirmation on push to protected branches. The agent cannot skip this check — the hook runs client-side and is not configurable from the system prompt.
Limiting tools: less is more
One of the most effective guardrails is the simplest: give the agent fewer tools than you think it needs. An agent with access to the terminal, filesystem, database and external APIs is powerful — and unpredictable. An agent with access only to the filesystem and git is slower, but safer.
Add tools incrementally based on demonstrated need, not preventively. Every tool you add expands the surface area for unexpected behaviour.
Human review at end of cycle
Technical guardrails aren't enough. At the end of every agentic work cycle there must be a human review — not as a formality, but as a genuine review.
Minimum checklist for reviewing agentic output:
- Does the code do what it was asked to do?
- Are the tests meaningful, or merely increasing coverage?
- Are existing conventions preserved?
- Are there visible security issues (SQL injection, XSS, IDOR)?
- Does the code respect architectural boundaries?
Review doesn't need to be exhaustive for every commit. But for every feature or logical unit it should happen.
Want to see how business processes can be automated? Book a consultation — we start where vibe-coding ends.