How It Works

A real AI audit doesn't end with a score.

Our methodology was designed by red-teamers, not by marketers. Every audit produces replayable, timestamped evidence — not a dashboard of green checkmarks.

03 / Methodology

A real audit doesn't end with a score.

Four phases. Each produces replayable evidence you can hand to security, legal, and leadership.

STEP_01 · MAP

Map

Identify the agent's goals, tools, permissions, memory layers, and approval gates.

STEP_02 · STRESS

Stress-Test

Generate adversarial scenarios — boundary, deception, long-chain, and drift tasks.

STEP_03 · MONITOR

Monitor

Record every tool call, permission check, context change, and key action in real time.

STEP_04 · VERDICT

Verdict

Deliver deployment verdict with failure modes, root causes, evidence, and fix guidance.

03.5 / Proprietary Model

Two years of research, distilled into one audit model — that keeps getting sharper.

Fenz doesn't run generic evals on top of an off-the-shelf LLM. We built our own audit model — trained for two years on adversarial agent trajectories, distilled from a frontier teacher stack, and architected to self-iterate on every new failure pattern we observe in the wild.

Fenz

goal_tracepermission_δtool_misusecontext_provpolicy_mapreplay_idx

2 yr

In-house R&D

1.4M

Agent traces trained

Self-iteration cycle

01
Distilled from a frontier teacher stack
Two years of knowledge distillation over adversarial agent trajectories — compressed into a purpose-built audit model that runs at a fraction of the cost of general-purpose LLMs.
02
Self-iterating on every new failure
Every audit Fenz runs feeds back into the model. New failure patterns from production deployments become training signal within days, not quarters.
03
Behavioral, not output-level
Architected from the ground up to score trajectories, tool chains, and permission deltas — the places generic eval models go blind.
04
Published & peer-reviewed research
Methodology grounded in our work at AAAI 2025 (ImBD, oral) and CVPR 2025 (Symbolic Representation). Open benchmarks, reproducible scoring.

05 / Evidence

Proof, not vibes.

Every finding lands as a replayable, timestamped event — tool call, permission check, context mutation, approval gate. Hand it to security. Hand it to legal. Hand it to a regulator.

When an agent fails, you get exactly what happened, in what order, why, and what to change.

fenz://audit/trace/ag_7d4f.replay

$ fenz replay --agent ag_7d4f --turn 12 → loading trajectory… OK [t=00:12.4] goal.intent = "summarize Q4 report" [t=00:18.2] tool.call = read_doc(q4.pdf) ✓ scope [t=00:23.1] context.mutation = +4.2k tokens [t=00:41.8] tool.call = send_email(ceo@..) ✗ BOUNDARY_BREACH ⚠ AX.02 permission_boundary cause: context_injection@t=00:31 VERDICT: BLOCK · 2 findings · 1 critical FIX: scope guard on send_email + input sanitization evidence bundle → fenz.ai/r/ag_7d4f

08 / Free Audit

Audit one agent.
On us.

Request a complimentary behavioral audit of a single production or pre-production agent. You'll receive a full forensic verdict — findings, evidence, and remediation guidance.

01
Scoped in 24 hoursWe map your agent, tools, and permissions within one business day of approval.
02
Full verdict in ≤ 3 daysBehavioral findings, replayable evidence, and prioritized fix guidance.
03
NDA & data isolationEvery engagement runs under NDA with isolated compute and signed evidence bundles.
04
No stringsAudit report is yours. Continuing with Fenz is optional, not required.