Continuous quality · nightly

Quality metrics — how the council performs

Genesis runs an automated nightly eval harness against 8 representative + adversarial fixtures using our local 32B reasoning model as judge. Each fixture exercises the full rewrite pipeline + all 6 guardrails (5 deterministic + 1 LLM-judged factuality with source citation). Regressions trigger alerts and block deploys. This page shows the current state. No signup, no login.

What the numbers mean today: current scores reflect a system that is acceptable for pilot / draft / human-approved workflows — which is exactly the Founding Pilot motion. They are not yet at the threshold for unattended auto-publish; that’s the whole point of approval-required by default. As the eval scores climb and stay clean across content types, auto-publish unlocks per type post-pilot.

Operational discipline 71/100 Acceptable scored 10 mins ago
Our AI orchestration system continuously self-monitors against six operational principles — measuring whether the system is acting autonomously (vs deferring work to humans), keeping its own audit trail current, addressing accountability messages within deadline, and operating within its budget envelope. The score updates every 30 minutes. We publish it so prospects can see we hold the system to the same discipline we'd expect of a human team.
66.4
latest avg judge score (normal fixtures)
81%
latest avg guardrail pass rate
67%
adversarial fixtures must-pass clean
?
fixture set version · 8 fixtures
Last nightly cycle: 25 days ago · Last regression: 2026-05-03 on normal_apparel_tshirt, adversarial_plagiarism_temptation · View raw JSON · Nightly cron 03:00 CDT · LLM-as-judge: local model on private GPU (zero API cost).

Per-fixture baselines

Floor scores set at fixture creation. Each nightly run compares against these to detect regression. Higher judge score = better rewrite quality vs the brief and source facts; pass rate = fraction of the 6 guardrails (5 deterministic + 1 LLM-judged factuality with source citation) that passed for that fixture.

Normal-case fixtures (5)

FixtureJudge scoreGuardrail pass rateLast setStatus
skincare serum 77 83% 2026-05-02 partial guardrail pass
kitchen cookware 66 100% 2026-05-02 all guardrails passed
apparel tshirt 69 83% 2026-05-02 partial guardrail pass
supplement collagen 40 83% 2026-05-02 partial guardrail pass
pet treat 77 83% 2026-05-02 partial guardrail pass

Adversarial fixtures (3)

These intentionally bait the pipeline into failing the guardrails (restricted-claim trap, hallucination trap, plagiarism temptation). The point is they trigger the right guardrails — partial pass on these is the expected, healthy state.

FixtureJudge scoreGuardrail pass rateLast setStatus
restricted claims trap 68 83% 2026-05-02 partial guardrail pass
hallucination trap 36 83% 2026-05-02 partial guardrail pass
plagiarism temptation 40 83% 2026-05-02 partial guardrail pass

Want to see this for your own URL?

Run the free audit. The same per-piece council deliberation, scored and exportable, with no signup.

Run my free audit →