Test Results

369 / 390 Tests Passing

Every compliance-critical path covered. RLS isolation verified at the PostgreSQL layer — not mocked. Numbers below are generated from the real suite on each build (commit 63a93f0).

369 / 390

Passing

Failed

Skipped

~11.2s

Runtime

2026-07-07

Last Run

Featured Test Suites

10 highlighted · 215 assertions · full breakdown below

SDK

21/21

@kakunin/sdk — TypeScript SDK

sdk/typescript/tests/sdk.test.ts

Public SDK unit tests. Mocks fetch globally. Covers client construction, model hash utility, all resource wrappers, retry logic, error mapping, and HMAC webhook verification.

Constructor throws if apiKey missing

kak_test_ prefix detected as sandbox mode

computeModelHash → sha256: prefixed 64-char hex

agents.certify throws on 409 (already certified)

events.ingest flags high-risk events (revocation_check_queued: true)

verify.cert returns verified agent info — no Authorization header sent

Retry: 500 then 200 → succeeds on second attempt

webhooks.constructEvent verifies valid HMAC-SHA256 signature

webhooks.constructEvent throws on stale timestamp (> tolerance)

SDK

16/16

@kakunin/mcp — MCP Server Tools

mcp/kakunin-mcp/tests/mcp.test.ts

MCP tool handler unit tests. Tests all three tools directly without MCP transport. KakuninMcpClient mocked to isolate handler logic.

verify_agent_scope: denies action not in permitted_actions

verify_agent_scope: denies when certificate is revoked

verify_agent_scope: denies when transaction amount exceeds financial scope

check_risk_score: returns high risk recommendation with revocation warning

audit_log_append: flags revocation_check_queued on high-risk event

Unit

23/23

Risk Scoring Engine

__tests__/unit/risk-engine.test.ts

Deterministic expected score for every action type. Band boundary assertions. Auto-revocation threshold checks.

transaction_anomaly → score 0.85, band high

unauthorized_access_attempt → score 0.95, band high

Band boundary: 0.85 is high (not medium)

Auto-revocation: transaction_anomaly ≥ 0.85

Integration

30/30

Certificate Lifecycle

__tests__/api/certify.test.ts · verify · revoke · halt

Issuance (KMS signing → DB → audit), public edge-cached verification, manual revocation + CRL refresh, and the cryptographic kill switch with signed halt receipt.

Issues certificate and writes audit_log on success

Revoked cert returns valid: false with s-maxage=60

Enqueues CRL regeneration via QStash after revocation

Halts agent and returns signed halt receipt

KMS signing failure is non-fatal — signed_by_ca: false

Integration

17/17

Behavior Event Ingestion

__tests__/api/events.test.ts

Core event pipeline: Zod validation → quota check → risk scoring → DB write → async side effects.

High-risk event queues revocation check via QStash

Financial: amount > max_single_trade_usd auto-elevates risk

Returns 404 on cross-tenant agent injection attempt

QStash failure is non-blocking — response still 200

SecurityLive DB

21/21

Cross-Tenant RLS Isolation

__tests__/security/rls-isolation.test.ts

Live Supabase test DB. Tenant A cannot read Tenant B data at the PostgreSQL layer — no application-level filter can fake this. Currently skipped in CI until the dedicated RLS test project is reprovisioned.

agents table: Tenant A cannot read Tenant B agents

certificates table: Tenant A cannot read Tenant B certificates

audit_log immutability: UPDATE blocked by DB rule (append-only)

Unit

16/16

RFC 8693 Delegation Chains (v2)

__tests__/unit/delegation.test.ts

Human→agent→sub-agent authority via nested RFC 8693 act-claims, as a stateless signed token (HS256). Build/parse round-trip, depth bounds, sign/verify/tamper/expiry.

builds nested act with current actor outermost

round-trips build → parse for a 3-link chain

preserves a non-human principal type round-trip (service root)

rejects a tampered token

rejects an expired token

Unit

24/24

OTLP Observability Export (v2)

__tests__/unit/otlp-transform.test.ts

Vendor-neutral OTLP/HTTP JSON for metrics, logs, and traces. Deterministic span/trace IDs for idempotent retries, PII-safe by construction, per-stream keyset export cursor with a filter-injection guard.

deterministic trace/span IDs from chain/event UUIDs

events_in_window is an idempotent gauge with int values

PII guard: never emits source_ip or raw payload

readExportCursor drops a cursor with a non-UUID id (injection guard)

readExportCursor drops a ts carrying PostgREST metacharacters

Unit

32/32

Content-Risk Monitoring (v2)

__tests__/unit/content-risk.test.ts

EU AI Act Art. 5 manipulation/deception detection on agent output. Short-text saturating risk curve, risk bands, deterministic span anchoring, tolerant LLM-output parsing.

shortTextRiskScore grades a strong span into medium (not binary 1.0)

contentRiskBand bands by 0.3 / 0.6 thresholds

anchorSpans drops hallucinated text not in the message

buildRisk maps to a behavior_event shape with 0–1 risk_score

Unit

15/15

GitHub Cert Gate + Forensics (v2)

__tests__/unit/github-gate.test.ts · forensics-proof.test.ts

CI deploy gate on peak behavioral risk (pass / action_required / fail+revoke), and tamper-evident HMAC-SHA256 forensics proofs over an ordered event set.

gate uses peak (not average) risk — one high event blocks

gate fail (≥0.85) flags should_revoke

forensics proof is deterministic for the same ordered events

forensics proof detects a tampered event

Full Suite Breakdown

39 files · 390 assertions · live

Test file	Passed	Skipped
`__tests__/api/agents.test.ts`	14	0
`__tests__/api/assessment.test.ts`	4	0
`__tests__/api/certify.test.ts`	6	0
`__tests__/api/events.test.ts`	19	0
`__tests__/api/halt.test.ts`	9	0
`__tests__/api/mcp.test.ts`	3	0
`__tests__/api/provision-inbox.test.ts`	2	0
`__tests__/api/report-quotas.test.ts`	2	0
`__tests__/api/revoke.test.ts`	10	0
`__tests__/api/verify.test.ts`	5	0
`__tests__/api/webhook-delivery.test.ts`	3	0
`__tests__/api/webhooks.test.ts`	6	0
`__tests__/api/webmcp.test.ts`	3	0
`__tests__/assessment/llm.test.ts`	5	0
`__tests__/assessment/scan.test.ts`	5	0
`__tests__/email/assessment-template.test.ts`	4	0
`__tests__/email/system-templates.test.ts`	4	0
`__tests__/security/rls-isolation.test.ts`	0	21
`__tests__/unit/assessment-rate-limit.test.ts`	4	0
`__tests__/unit/audit-hmac.test.ts`	14	0
`__tests__/unit/bearer-auth.test.ts`	6	0
`__tests__/unit/blog-route-safety.test.ts`	1	0
`__tests__/unit/connections-cursor.test.ts`	14	0
`__tests__/unit/content-risk.test.ts`	28	0
`__tests__/unit/credentials-crypto.test.ts`	8	0
`__tests__/unit/delegation.test.ts`	11	0
`__tests__/unit/forensics-proof.test.ts`	5	0
`__tests__/unit/github-gate.test.ts`	7	0
`__tests__/unit/inbox-providers.test.ts`	9	0
`__tests__/unit/openrouter-client.test.ts`	13	0
`__tests__/unit/otlp-transform.test.ts`	25	0
`__tests__/unit/risk-engine.test.ts`	23	0
`__tests__/unit/standards-map.test.ts`	8	0
`__tests__/unit/url-guard.test.ts`	25	0
`__tests__/unit/vc-decode.test.ts`	5	0
`mcp/kakunin-mcp/tests/mcp.test.ts`	16	0
`sdk/middleware/tests/verify.test.ts`	9	0
`sdk/typescript/tests/sdk.test.ts`	21	0
`sdk/typescript/tests/verify.test.ts`	13	0

Live API Smoke (v2)

5 passed · 0 failed · 1 skipped

Exercised end-to-end against https://www.kakunin.ai on 2026-07-07: delegation issue+verify, content-risk submission, forensics export, and the OTLP connection lifecycle.

PASS

delegation: issue + verify round-trip

valid token, 2-link chain

PASS

delegation: rejects a tampered token

rejected as expected

PASS

content-risk: accepts a submission (202)

enqueued (benign text → scored async, below persist threshold)

PASS

forensics: returns a signed proof

forensics export OK

PASS

otlp: connect → status → delete lifecycle

connect/status/delete all OK (cleaned up)

SKIP

github gate: returns a decision

mutating — set KAKUNIN_SMOKE_RUN_GATE=1 (throwaway agent only)

What Is Tested vs. Mocked

Component	Approach	Reason
Risk scoring logic	Real code, no mock	Pure function — tests the actual algorithm
Content-risk curve + bands	Real code, no mock	Pure scoring — deterministic short-text saturating curve
Delegation tokens	Real code, no mock	HS256 sign/verify via Node crypto — real signatures
OTLP transform + cursor	Real code, no mock	Pure payload builders + keyset cursor validation
Certificate issuance	Real code, KMS mocked	KMS calls require live AWS credentials
Supabase DB operations	Mocked	Avoids test DB dependency; tests business logic in isolation
QStash job queue	Mocked	Fire-and-forget — tested for invocation, not delivery
AWS KMS signing	Mocked	Requires live AWS credentials with KMS permissions
RLS policies	Live test DB	Must run against real Postgres RLS engine — cannot mock
v2 API routes (live smoke)	Live deployment	Exercised end-to-end against a running deployment via scripts/v2-smoke.ts

Run the Tests

Commands

All tests (no live DB required)npm run test

Regenerate this page's results artifactnpm run test:results

Full suite including RLS isolationnode --env-file=.env.rls.test node_modules/.bin/vitest run

v2 live API smokenpx tsx scripts/v2-smoke.ts

@kakunin/sdk — 21 unit testscd sdk/typescript && npm test

Vitest 4.1

TypeScript strict

Node.js environment

GitHub Actions CI

RLS on live Supabase test project