Human-approved AI conduct QA

Multi-agent review of call transcripts with evidence-grounded citations. Supervisor corrections become rollbackable Policy Genes — institutional memory regulators can audit.

Watch demo Open the source

UCWS Singapore 2026License · MITEvent →

Problem → solution

QA teams repeat the same corrections

Policy Genes preserve supervisor-approved patterns

LLMs hallucinate policy

Evidence-grounded citations + abstain when unsure

Regulators need accountability

Supervisor approval, audit trail, rollback

Supervisor-approved conduct QA with Policy Genes — auditable institutional memory, not black-box scoring.

How the swarm works

Three agents debate each transcript; the supervisor approves before a Policy Gene is stored.

Prosecutor

Surfaces policy violations with evidence-grounded citations from the knowledge base.

Defender

Challenges weak citations and flags where the transcript does not support a verdict.

Arbiter

Synthesizes pass / needs review / fail — operator-in-the-loop before any gene is stored.

Policy Genes

When a supervisor overrides a verdict, the system stores a supervisor-approved Policy Gene — auditable, versioned, rollbackable. The swarm reuses approved patterns on similar cases instead of re-litigating the same correction every week.

IMDA MGF alignment →POST /genes/learn

Proof

Held-out eval — numbers from the synthetic suite.

Scenario pass rate

16 / 16

Synthetic held-out eval suite — 100% pass in mock mode.

Citation coverage

100%

Every verdict traceable to policy sources in the KB.

Abstain rate

~6.3%

CASE-007 — explicit abstain instead of guessing.

Gene learning

CASE-002 → 005

escalation_offered: needs_review → pass after supervisor gene.

Try the stack

Streamlit 5-panel demo, FastAPI at :8090, mock mode for CI — live stack with Qdrant + BGE rerank documented in the repo.

Watch demo Clone & verify