From Prototype to Production:
An AI Agent Security Checklist
Your agent works in development. It passed your test suite. Now you need to ship it to production — where real users send unexpected inputs, retrieved documents contain hidden instructions, and tool calls chain in ways you never anticipated. Here are 8 things to check before you deploy.
Scan all inputs before they reach the LLM
Every user message, retrieved document, and API response entering your agent is a potential attack vector. Scan inputs for prompt injection, social engineering, and manipulation patterns before the LLM processes them.
Scan all outputs before they reach the user
Your agent might leak API keys, credentials, PII, or internal data in its responses. Scan every output for sensitive information before it leaves the system — especially when the agent has access to databases or file systems.
Restrict tool access per agent
Apply the principle of least privilege to your agents. A customer support agent doesn't need write access to your production database. A research agent doesn't need the ability to send emails. Define which tools each agent can call.
Monitor tool call sequences, not just individual calls
Individual tool calls can look benign while the sequence constitutes an attack. Reading a config file is fine. Sending its contents to an external API is not. Monitor the chain of tool calls across each agent session.
Run in monitor mode in staging first
Don't go straight to enforce mode in production. Run Rune in monitor mode in your staging environment first. Observe what it detects. Tune your policies and thresholds. Then flip to enforce mode in production with confidence.
Set up real-time alerts
When Rune blocks a threat or detects an anomaly, your team needs to know immediately. Route alerts to Slack, email, or webhooks. Set severity thresholds so critical alerts wake someone up and low-severity alerts get reviewed in the morning.
Define policies as code
Write your security policies in YAML and check them into version control alongside your agent code. This makes policies reviewable, testable, and auditable. When the policy changes, the PR shows exactly what changed.
Review your agent's risk score weekly
Rune assigns a risk score to each agent based on threat frequency, severity, and tool access patterns. Review scores weekly to catch trends — an increasing score means your agent is seeing more threats or exhibiting riskier behavior.
The pattern
Every item on this checklist follows the same principle: don't trust, verify. Don't trust user inputs — scan them. Don't trust agent outputs — scan them. Don't trust tool calls — intercept them. Don't trust that testing caught everything — monitor in production. The goal isn't to make your agent slower or more restrictive. It's to make it trustworthy.
Ship agents you can trust
Rune handles items 1-8 on this checklist with one integration. Three lines of code. Free plan includes 10K events/mo.