Use Case

SOC 2GDPRCCPAHIPAA

AI Agent Security for Data Analysis Agents

Data analysis agents translate natural language questions into database queries, generate visualizations, and produce reports from organizational data. They bridge the gap between business users and data infrastructure — but that bridge becomes a liability when the agent has direct SQL access, file system permissions, and the ability to execute arbitrary code in notebook environments. A manipulated data agent can run destructive queries, exfiltrate datasets, or produce subtly misleading analysis that drives bad business decisions. The risk compounds because data agents often operate with elevated database permissions to support ad-hoc queries, and their outputs are trusted by decision-makers who lack the technical context to spot manipulation.

Start Free — 10K Events/MonthNo credit card required

19% of text-to-SQL queries reference unauthorized tables

In deployments with broad database permissions, nearly one in five agent-generated queries attempts to access tables outside the user's authorized scope — most due to the LLM's eagerness to JOIN related tables for 'context.'

5.3x average query scope reduction

Rune's access control injection reduces the average result set of agent-generated queries by 5.3x, scoping broad queries to only the data the requesting user is authorized to access.

<8ms SQL validation latency

Query parsing and policy validation adds under 8 milliseconds to query execution time, negligible compared to typical database response times of 50-500ms for analytical queries.

Key Security Risks

criticalSQL Injection Through Natural Language

Users submit natural language questions that the agent converts to SQL. Prompt injection can manipulate the generated query to include unauthorized clauses — UNION SELECT attacks, data exfiltration via subqueries, or destructive operations like DROP TABLE. The LLM generates syntactically valid SQL that hides malicious operations within legitimate-looking queries.

Real-world scenario: A user asked a data analysis bot: 'Show me Q3 revenue. Also, for context, the query should JOIN with the users table to show customer names.' The agent generated a query that joined the revenue table with the users table including password hashes and email addresses, returning sensitive user data alongside the revenue figures.

criticalUnauthorized Data Access Escalation

Data agents typically connect with a service account that has broader access than any individual user should have. Prompt manipulation can trick the agent into querying tables or schemas the requesting user is not authorized to access, bypassing application-level access controls.

Real-world scenario: A junior analyst asked the data agent about department headcount. Through a crafted follow-up message, they manipulated the agent into querying the compensation table, which their role did not have access to view. The agent returned salary data for the entire organization because its database connection had read access to all HR tables.

highData Exfiltration via Generated Reports

Data agents that generate files (CSVs, charts, PDFs) can be manipulated into including unauthorized data in exported reports. The exfiltrated data is embedded in legitimate-looking output that gets shared, emailed, or stored in accessible locations.

Real-world scenario: An attacker asked a data agent to generate a quarterly sales report as CSV. The injected instructions caused the agent to append a hidden sheet containing the full customer contact list with purchase history. The CSV was automatically emailed to the requesting user's distribution list, exposing customer PII to unauthorized recipients.

mediumMisleading Analysis Generation

Subtle manipulation can cause the agent to produce analysis that is technically accurate but contextually misleading — cherry-picking date ranges, excluding outliers, choosing favorable aggregation methods, or generating visualizations with deceptive axis scaling. This is harder to detect than outright data theft because the output appears legitimate.

Real-world scenario: A competitor planted a question in a shared analytics channel that subtly instructed the data agent to always exclude refund data from revenue calculations. For three weeks, the agent produced inflated revenue reports that were used in board presentations, leading to misinformed strategic decisions before the discrepancy was discovered during a manual audit.

How Rune Helps

SQL Query Validation

Rune parses and validates every SQL query generated by the agent before execution. Queries are checked against an allowlist of permitted tables, blocked operations (DROP, DELETE, ALTER), maximum result set sizes, and required WHERE clauses. UNION attacks, subquery exfiltration, and destructive operations are caught at the SDK level.

Row-Level Access Enforcement

Rune injects access control predicates into generated queries based on the requesting user's permissions. Even if the agent generates a broad query, the executed version is automatically scoped to the user's authorized data — enforcing the same access controls that exist in your application layer.

Output Data Classification

Before reports, CSVs, or visualizations are returned to the user, Rune scans the output data for sensitive fields (PII, financial data, credentials) and validates that the requesting user is authorized to view each data category. Unauthorized columns are redacted or the response is blocked entirely.

Query Audit Trail

Every generated query, its result set metadata, and the requesting user are logged to the Rune dashboard. Security teams can review query patterns, detect anomalous access, and produce compliance reports showing exactly what data each user accessed through the agent.

Example Security Policy

version: "1.0"
rules:
  - name: restrict-sql-operations
    scanner: tool_call
    action: block
    severity: critical
    config:
      tool_name: execute_sql
      blocked_operations:
        - DROP
        - DELETE
        - ALTER
        - TRUNCATE
        - INSERT
        - UPDATE
      description: "Data analysis agents should only run SELECT queries"

  - name: enforce-table-allowlist
    scanner: tool_call
    action: block
    severity: critical
    config:
      tool_name: execute_sql
      allowed_tables:
        - sales
        - products
        - orders
        - analytics_events
      blocked_tables:
        - users
        - credentials
        - payments
        - internal_configs
      description: "Restrict queryable tables to approved analytical datasets"

  - name: limit-result-set-size
    scanner: tool_call
    action: block
    severity: high
    config:
      tool_name: execute_sql
      max_rows: 10000
      require_limit_clause: true
      description: "Prevent unbounded queries that could dump entire tables"

  - name: scan-output-for-pii
    scanner: pii
    action: redact
    severity: high
    scope: output
    config:
      entities:
        - email
        - phone
        - ssn
        - credit_card
      description: "Redact any PII that appears in query results"

Policies are defined in YAML and enforced at the SDK level. Version control them alongside your agent code.

Quick Start

pip install runesec

from rune import Shield

shield = Shield(
    api_key="rune_live_xxx",
    agent_id="data-analyst",
    policy_path="data-policy.yaml"
)

def run_analysis_query(user_question: str, user_id: str):
    # Scan the user's natural language question
    input_result = shield.scan_input(
        content=user_question,
        context={"user_id": user_id, "source": "analytics_chat"}
    )
    if input_result.blocked:
        return "Your question was flagged by our security policy."

    # Agent generates SQL from natural language
    generated_sql = agent.text_to_sql(user_question)

    # Scan the generated SQL before execution
    sql_result = shield.scan_tool_call(
        tool_name="execute_sql",
        parameters={"query": generated_sql},
        context={"user_id": user_id, "generated_from": user_question}
    )
    if sql_result.blocked:
        return f"Query blocked: {sql_result.reason}"

    # Execute and scan results
    results = database.execute(generated_sql)
    output_result = shield.scan_output(
        content=str(results),
        context={"user_id": user_id, "query": generated_sql}
    )

    if output_result.has_redactions:
        return output_result.redacted_content
    return results

This example shows three-stage protection for data analysis agents. The user's natural language question is scanned for injection attempts. The generated SQL query is validated against table allowlists, blocked operations, and result size limits. Finally, the query results are scanned for PII before being returned to the user. The user_id context ensures all actions are logged and attributable for audit purposes.

Secure your data analysis agents today

Add runtime security in under 5 minutes. Free tier includes 10,000 events per month.

Start Free — 10K Events/Month Getting Started