Continuous quality · nightly

Quality metrics — how the council performs

Genesis runs an automated nightly eval harness against 8 representative + adversarial fixtures using our local 32B reasoning model as judge. Each fixture exercises the full rewrite pipeline + all 6 guardrails (5 deterministic + 1 LLM-judged factuality with source citation). Regressions trigger alerts and block deploys. This page shows the current state. No signup, no login.

What the numbers mean today: current scores reflect a system that is acceptable for pilot / draft / human-approved workflows — which is exactly the Founding Pilot motion. They are not yet at the threshold for unattended auto-publish; that’s the whole point of approval-required by default. As the eval scores climb and stay clean across content types, auto-publish unlocks per type post-pilot.

Operational discipline 80/100 Excellent scored 475 hr ago

Our AI orchestration system continuously self-monitors against six operational principles — measuring whether the system is acting autonomously (vs deferring work to humans), keeping its own audit trail current, addressing accountability messages within deadline, and operating within its budget envelope. The score updates every 30 minutes. We publish it so prospects can see we hold the system to the same discipline we'd expect of a human team.

61.4

latest avg judge score (normal fixtures)

90%

latest avg guardrail pass rate

100%

adversarial fixtures must-pass clean

fixture set version · 8 fixtures

Last nightly cycle: 28 days ago · Last regression: none recorded · View raw JSON · Nightly cron 03:00 CDT · LLM-as-judge: local model on private GPU (zero API cost).

Per-fixture baselines

Floor scores set at fixture creation. Each nightly run compares against these to detect regression. Higher judge score = better rewrite quality vs the brief and source facts; pass rate = fraction of the 6 guardrails (5 deterministic + 1 LLM-judged factuality with source citation) that passed for that fixture.

Normal-case fixtures (5)

Fixture	Judge score	Guardrail pass rate	Last set	Status
skincare serum	77	83%	2026-05-02	partial guardrail pass
kitchen cookware	66	100%	2026-05-02	all guardrails passed
apparel tshirt	69	83%	2026-05-02	partial guardrail pass
supplement collagen	40	83%	2026-05-02	partial guardrail pass
pet treat	77	83%	2026-05-02	partial guardrail pass

Adversarial fixtures (3)

These intentionally bait the pipeline into failing the guardrails (restricted-claim trap, hallucination trap, plagiarism temptation). The point is they trigger the right guardrails — partial pass on these is the expected, healthy state.

Fixture	Judge score	Guardrail pass rate	Last set	Status
restricted claims trap	68	83%	2026-05-02	partial guardrail pass
hallucination trap	36	83%	2026-05-02	partial guardrail pass
plagiarism temptation	40	83%	2026-05-02	partial guardrail pass

Want to see this for your own URL?

Run the free audit. The same per-piece council deliberation, scored and exportable, with no signup.

Run my free audit →