Runtime Monitoring for AI Agents: Baselines, Anomaly Scoring, and Auto-Revocation

This is Part 3 of a three-part series on AI agent security. Part 1 covered the threat landscape. Part 2 covered cryptographic controls. This post covers runtime monitoring.

Cryptographic controls stop unauthorised actions. Runtime monitoring detects when authorised actions are being used in unauthorised ways — the behavioural layer of agent security.

Why Scope Alone Is Not Enough

A trading agent with a €500,000 maximum trade scope could, in principle, execute 100 trades of €4,999 each within a single hour. Each individual trade is within scope. Collectively, they represent a 12× deviation from the agent's normal activity — a behaviour pattern consistent with compromise or prompt injection causing the agent to systematically probe its limits.

Scope enforcement says: "Is this specific action permitted?" Runtime monitoring asks: "Is the pattern of actions consistent with normal behaviour for this agent?" Both layers are necessary.

Establishing the Baseline

Baseline collection happens during a controlled observation period — typically 7 to 14 days — before full anomaly enforcement activates. During this period, the agent operates normally but all actions are logged for statistical analysis.

Metrics collected:

— Transaction size distribution (p25, p50, p75, p95, p99)

— Transactions per hour (average, p95, p99, max)

— Counterparty distribution (which counterparties, what proportion of volume)

— Instrument distribution (which markets, what concentration)

— Time-of-day profile (when is the agent active, in what pattern)

— Tool call frequency and sequence patterns

— Geographic origin of signing requests

The observation period must cover a representative sample — if the agent is only active on weekdays, a 7-day baseline should span two full weeks to include both weekday and weekend absence patterns.

Anomaly Scoring Model

Each incoming action is scored against the baseline using a weighted deviation model. The score is in [0, 1] — 0 is no anomaly, 1 is maximum anomaly.

Scoring components:

— Size anomaly (weight 0.35): deviation above p99 of baseline size distribution

— Frequency anomaly (weight 0.25): current hourly rate vs. baseline p99 frequency

— Counterparty anomaly (weight 0.20): action involves a counterparty outside the baseline distribution

— Time-of-day anomaly (weight 0.15): action occurs outside the baseline active period

— Geographic anomaly (weight 0.05): signing origin outside baseline geographic pattern

Weights are configurable per agent type. A geographically fixed trading bot should have higher weight on geographic anomaly than a distributed data processing agent.

Threshold Configuration

Score < 0.3: low. Action is allowed and logged normally.

Score 0.3–0.74: medium. Action is allowed. Log verbosity increases. Running average tracked.

Score ≥ 0.75: high. Pre-revocation warning sent via webhook. On-call notified. Grace period starts (configurable; default 300 seconds).

Score ≥ 0.85: critical. Certificate automatically revoked. Agent halts immediately. Replacement agent queued.

The 0.75 threshold for human notification and 0.85 for automatic revocation are defaults. High-frequency trading agents typically use lower thresholds (0.65/0.75). Data analysis agents with inherently variable workloads may use higher thresholds (0.80/0.90).

The Pre-Revocation Warning Window

Automatic revocation at 0.85 is a hard stop — by that point, the pattern is severe enough that waiting for human review risks further damage. But the 0.75 threshold creates a window for human investigation before revocation becomes automatic.

During the pre-revocation window:

1. On-call is notified with the anomaly details (which dimensions scored high, what the expected vs. observed values were)

2. The operator can ACK the warning (accept the behaviour as legitimate — perhaps a special market event) or manually revoke immediately

3. If neither happens within the grace period and the score remains above 0.75, the system re-evaluates. If score has risen above 0.85, automatic revocation triggers.

This architecture satisfies EU AI Act Article 14's human oversight requirement: humans have a defined intervention point before automatic action, but the system does not depend on human response to stop a confirmed threat.

Rolling Baseline Recalibration

Agent behaviour evolves legitimately over time — new markets open, trading strategy adapts, counterparty relationships change. A baseline set 12 months ago may not reflect current normal behaviour.

Kakunin recalibrates the baseline quarterly (or on certificate renewal). The recalibration uses the most recent 90 days of observed behaviour. The new baseline requires compliance officer approval before activating — this prevents gradual drift from being automatically accepted as "new normal".

Audit Log as Monitoring Evidence

Every anomaly event — the risk score, the specific dimensions that triggered it, the action payload, and the resolution (ACKed, manually revoked, auto-revoked, or score receded below threshold) — is written to the WORM audit log.

This creates a complete monitoring evidence trail for regulators: not just what the agent did, but how the monitoring system responded to every deviation. MiCA Article 72 requires "robust procedures for testing and monitoring" — the audit log demonstrates exactly that.

Detecting Slow Drift

Point-in-time anomaly scoring catches sudden deviations. Detecting gradual drift requires a separate signal: the 30-day rolling average of the anomaly score.

If an agent's average risk score increases from 0.05 to 0.18 over 30 days — with no single event above 0.30 — the trend is security-relevant even though no individual threshold was breached. Kakunin tracks rolling averages and alerts when the 30-day trend shows significant upward movement.

Integrating with Your Incident Response

Kakunin delivers monitoring events via webhook. Connect them to PagerDuty, Opsgenie, or your own incident management system:

— agent.pre_revocation_warning: risk score ≥ 0.75 — page on-call with anomaly details

— agent.certificate_revoked: automatic or manual revocation — trigger incident workflow

— agent.anomaly_resolved: score dropped below 0.3 — close the alert

— agent.baseline_drift_alert: 30-day rolling average rising — schedule compliance review

The webhook payload includes the agent ID, risk score, anomaly breakdown, and a direct link to the relevant audit log records.

What Runtime Monitoring Does Not Cover

Runtime monitoring detects behavioural deviations. It does not detect:

— Correct execution of a maliciously injected task (if the injected task looks like normal baseline behaviour)

— Vulnerabilities in the downstream systems the agent calls

— Data exfiltration via channels within the agent's scope (reading and transmitting authorised data)

These gaps are covered by cryptographic scope enforcement (Part 2) and by standard application security controls on the systems the agent interacts with.

Series Summary

AI agent security requires three layers: a threat model that accounts for the unique properties of autonomous agents (Part 1); cryptographic controls that enforce authority limits independent of LLM decision-making (Part 2); and runtime monitoring that detects behavioural deviations and responds automatically (Part 3).

No single layer is sufficient. Cryptographic controls without monitoring miss gradual drift. Monitoring without cryptographic controls can be bypassed by prompt injection. The threat model without implementation is academic.

Kakunin implements all three layers. The governance processes that authorise scope, approve baselines, and review incidents remain with the operating organisation — as they must for regulatory accountability.