Autonomous agents operate with real authority: executing trades, calling APIs, modifying infrastructure, processing payments. Traditional application security was designed for deterministic software controlled by humans. Agents are different — they make decisions at runtime, spawn sub-agents, use external tools, and act without synchronous human approval.
This guide covers the threat model, defensive architecture, and operational controls required to deploy AI agents securely in regulated environments.
Traditional application security assumes the application logic is fixed at deploy time. Autonomous agents introduce three new threat vectors:
1. Prompt Injection
An adversarial payload embedded in external data (a document, a search result, a user message) causes the agent to execute unintended actions. Unlike SQL injection, the exploit surface is natural language rather than syntax.
2. Tool Scope Escalation
An agent with access to read_file and execute_bash can be coerced into combining them in ways the operator never intended. Scope policies on the certificate layer constrain what the agent may do, independent of what the LLM decides.
3. Identity Spoofing
Without cryptographic identity, there's no way to verify that the agent performing a transaction is the same agent that was authorised. An attacker who gains container access can impersonate the agent unless identity is bound to a hardware-backed key in KMS/HSM.
| Vector | Exploited By | Mitigation |
|---|---|---|
| Prompt injection | Adversarial content in tool outputs | Sandboxed tool execution; output validation |
| Stolen API key | Network interception; env var leak | Replace API keys with X.509 certificates |
| Container escape | Runtime vulnerability | KMS-backed keys; key never in container memory |
| Rogue sub-agent | LLM-orchestrated agent spawning | Sub-agent certificate scope = subset of parent |
| Baseline drift | Gradual objective deviation | Continuous behavioral profiling; rolling baseline |
| Replay attack | Captured signed request | Signed nonces; short-lived certificate validity |
API keys are:
env dumps)X.509 certificates issued by Kakunin solve all of these:
// Register agent — one certificate per agent instance
const agent = await kakunin.agents.create({
name: 'payment-processor-eu-v3',
metadata: {
version: '3.1.0',
deployment: 'k8s-eu-west-1',
instance: process.env.POD_NAME, // per-pod identity
},
});
const cert = await kakunin.agents.getCertificate(agent.id, {
validityDays: 365,
scope: {
// Scope limits enforce authority at the cryptographic layer
maxTransactionSize: 10000, // EUR
allowedCounterparties: ['stripe', 'revolut'],
allowedActions: ['charge', 'refund'],
allowedRegions: ['eu-west-1', 'eu-central-1'],
},
});
// cert.pem — public certificate (safe to share, embed in requests)
// cert.kmsKeyArn — private key stays in KMS, never leaves
Kakunin Root CA (AWS KMS RSA_4096, eu-west-1)
└── Kakunin Intermediate CA (per-tenant)
└── Agent Certificate (per-agent-instance)
├── Subject: CN=payment-processor-eu-v3
├── SAN: agent-id=a_xyz123
├── Scope extensions (custom X.509 extensions)
└── Validity: 365 days
Every significant action should be signed with the agent's private key before submission:
import { createSign } from 'crypto';
async function signAction(payload: object, kmsKeyArn: string) {
// KMS signs — private key never leaves HSM
const message = Buffer.from(JSON.stringify(payload));
const signature = await kms.sign({
KeyId: kmsKeyArn,
Message: message,
MessageType: 'RAW',
SigningAlgorithm: 'RSASSA_PKCS1_V1_5_SHA_256',
});
return {
payload,
signature: Buffer.from(signature.Signature!).toString('base64'),
certificateFingerprint: computeFingerprint(agentCert),
timestamp: Date.now(),
nonce: crypto.randomUUID(), // prevents replay
};
}
Downstream systems verify the signature before processing:
async function verifySignedAction(signedAction: SignedAction) {
// 1. Verify certificate is valid and not revoked
const cert = await kakunin.certificates.verify(signedAction.certificateFingerprint);
if (!cert.valid) throw new Error(`Certificate invalid: ${cert.reason}`);
// 2. Verify signature
const verifier = createVerify('RSA-SHA256');
verifier.update(Buffer.from(JSON.stringify(signedAction.payload)));
const valid = verifier.verify(cert.publicKey, signedAction.signature, 'base64');
if (!valid) throw new Error('Signature verification failed');
// 3. Check nonce (prevent replay)
const seen = await redis.setnx(`nonce:${signedAction.nonce}`, '1', 'EX', 3600);
if (!seen) throw new Error('Replay detected — nonce already used');
// 4. Enforce scope
const action = signedAction.payload;
if (action.amount > cert.scope.maxTransactionSize) {
throw new Error(`Action exceeds certificate scope: ${action.amount} > ${cert.scope.maxTransactionSize}`);
}
return { verified: true, agentId: cert.agentId };
}
Scope policies are embedded in the certificate as custom X.509 extensions. They cannot be modified without reissuing the certificate (requires Kakunin CA private key in KMS). Three enforcement layers:
Layer 1 — Gateway (API Layer)
Middleware reads certificate scope before routing requests. Requests outside scope are rejected with 403 before reaching business logic.
Layer 2 — Tool Guard (LLM Tool Layer)
Each tool call passes through verify_agent_scope before execution:
import { ToolGuard } from '@kakunin/sdk';
const guard = new ToolGuard({
apiKey: process.env.KAKUNIN_API_KEY,
agentId: process.env.AGENT_ID,
certificatePath: '/var/certs/cert.pem',
kmsKeyArn: process.env.KMS_KEY_ARN,
});
// Wrap every tool
const tools = {
charge: guard.wrap('charge', async (params) => {
// Guard checks: cert valid? amount within scope? counterparty allowed?
// Throws if any check fails — LLM gets tool error, not unguarded execution
return await stripe.charges.create(params);
}),
refund: guard.wrap('refund', async (params) => {
return await stripe.refunds.create(params);
}),
};
Layer 3 — Behavioral Anomaly (Monitoring Layer)
Even scope-compliant actions are checked against behavioral baseline. 100 individually valid €9,900 charges in 10 minutes is compliant per scope, but a 12× deviation from the hourly baseline triggers a pre-revocation warning.
from kakunin import ToolGuard
from langchain.tools import tool
guard = ToolGuard(
api_key=os.environ["KAKUNIN_API_KEY"],
agent_id=os.environ["AGENT_ID"],
)
@tool
@guard.verify_scope("charge")
def process_payment(amount: float, currency: str, customer_id: str) -> str:
"""Process a customer payment."""
# Guard verifies: cert valid, amount within scope, anomaly score < threshold
result = stripe.charge(amount, currency, customer_id)
return f"Charged {amount} {currency} to {customer_id}: {result.id}"
Run the agent in observation mode for 7–14 days before enforcing behavioral limits:
// Week 1: Permissive mode — observe, don't block
const agent = await kakunin.agents.create({
name: 'payment-processor-eu-v3',
mode: 'observe', // Log anomalies, don't block
anomalyThreshold: 1.0, // Never block during baseline collection
});
// Week 2: Review collected baseline stats
const stats = await kakunin.monitoring.getStats(agent.id, {
window: '7d',
metrics: ['transaction_size', 'frequency', 'counterparty_distribution', 'hour_of_day'],
});
// Approve and lock baseline
await kakunin.monitoring.setBaseline(agent.id, {
transaction_size: { p50: stats.transaction_size.p50, p99: stats.transaction_size.p99 },
transactions_per_hour: { p95: stats.frequency.p95 },
preferred_counterparties: stats.counterparty_distribution.top(5),
active_hours: stats.hour_of_day.activePeriods,
});
| Score | Band | Action |
|---|---|---|
< 0.3 | Low | Allow; log normally |
0.3 – 0.74 | Medium | Allow; increase log verbosity |
>= 0.75 | High | Issue pre-revocation warning; page on-call |
>= 0.85 | Critical | Auto-revoke certificate; halt all agent actions |
These thresholds are configurable per agent. Adjust for risk tolerance and false-positive rate of the specific agent's task profile.
Prompt injection leading to data exfiltration:
Compromised container (stolen cert used from outside):
Gradual objective drift:
When an anomaly score exceeds 0.85, Kakunin issues a pre-revocation warning and schedules automatic revocation:
// Webhook received when anomaly threshold breached
app.post('/webhook/kakunin', async (req, res) => {
const event = req.body;
if (event.type === 'agent.pre_revocation_warning') {
const { agent_id, risk_score, anomaly_details } = event.data;
// 1. Page on-call
await pagerduty.createIncident({
title: `Agent ${agent_id} anomaly score ${risk_score}`,
severity: 'high',
details: anomaly_details,
});
// 2. Optional: suspend non-critical tasks while investigating
await agentOrchestrator.pause(agent_id, { reason: 'anomaly_investigation' });
res.json({ acknowledged: true });
}
if (event.type === 'agent.certificate_revoked') {
const { agent_id, revocation_reason } = event.data;
// 1. Hard stop — refuse all new tasks
await agentOrchestrator.terminate(agent_id);
// 2. Quarantine transactions from the anomaly window
await compliance.flagForReview({
agent_id,
from: event.data.anomaly_start,
to: event.data.revoked_at,
});
// 3. Spin up replacement agent (new certificate, fresh identity)
await agentOrchestrator.spawn({
template: agent_id,
reason: 'post_revocation_replacement',
});
res.json({ acknowledged: true });
}
});
After an incident, the audit log provides complete reconstruction:
// Pull all actions for the affected agent during the incident window
const auditTrail = await supabase
.from('audit_log')
.select('*')
.eq('tenant_id', tenantId)
.eq('actor_id', agentId)
.gte('created_at', incidentStart.toISOString())
.lte('created_at', incidentEnd.toISOString())
.order('created_at', { ascending: true });
// Each row contains:
// - Signed action payload (proves what the agent intended)
// - Signature (proves it was this agent's certificate)
// - Risk score at time of action
// - Tool call parameters
// - Outcome
Every row is WORM — no update, no delete. Satisfies EU AI Act Article 12 record-keeping and MiCA Article 71 audit trail requirements.
apiVersion: v1
kind: Pod
metadata:
name: payment-agent-pod
annotations:
kakunin.ai/agent-id: "a_xyz123"
kakunin.ai/certificate-fingerprint: "sha256:abc..."
spec:
serviceAccountName: payment-agent-sa # Minimal RBAC
securityContext:
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: agent
image: myorg/payment-agent:v3.1.0
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
readOnlyRootFilesystem: true
env:
- name: AGENT_ID
value: "a_xyz123"
- name: KMS_KEY_ARN
valueFrom:
secretKeyRef:
name: kakunin-creds
key: kms-key-arn
# KAKUNIN_API_KEY from Doppler / external-secrets
volumeMounts:
- name: cert-volume
mountPath: /var/certs
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: cert-volume
projected:
sources:
- secret:
name: agent-certificate
- name: tmp
emptyDir: {}
Restrict egress to only required endpoints:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-agent-egress
spec:
podSelector:
matchLabels:
app: payment-agent
policyTypes: [Egress]
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0 # Stripe, Kakunin API
ports:
- protocol: TCP
port: 443
# Block all other egress (no DNS exfiltration, no metadata endpoints)
| Control | Kakunin Feature | Regulation |
|---|---|---|
| Agent identity documentation | X.509 certificate with serial number | EU AI Act Art. 11, MiCA Art. 70 |
| Authority limits enforced | Scope policy in certificate | EU AI Act Art. 26, MiCA Art. 67 |
| Continuous monitoring | Behavioral baseline + anomaly detection | EU AI Act Art. 9, MiCA Art. 72 |
| Automatic halt | Auto-revocation at score ≥ 0.85 | EU AI Act Art. 14 (human oversight) |
| Immutable audit trail | WORM audit_log (no UPDATE/DELETE) | EU AI Act Art. 12, MiCA Art. 71 |
| Incident reporting | Webhook + pre-revocation warnings | EU AI Act Art. 73, MiCA Art. 67 |