Quality metrics — how the council performs
Genesis runs an automated nightly eval harness against 8 representative + adversarial fixtures using our local 32B reasoning model as judge. Each fixture exercises the full rewrite pipeline + all 6 guardrails (5 deterministic + 1 LLM-judged factuality with source citation). Regressions trigger alerts and block deploys. This page shows the current state. No signup, no login.
What the numbers mean today: current scores reflect a system that is acceptable for pilot / draft / human-approved workflows — which is exactly the Founding Pilot motion. They are not yet at the threshold for unattended auto-publish; that’s the whole point of approval-required by default. As the eval scores climb and stay clean across content types, auto-publish unlocks per type post-pilot.
Per-fixture baselines
Floor scores set at fixture creation. Each nightly run compares against these to detect regression. Higher judge score = better rewrite quality vs the brief and source facts; pass rate = fraction of the 6 guardrails (5 deterministic + 1 LLM-judged factuality with source citation) that passed for that fixture.
Normal-case fixtures (5)
| Fixture | Judge score | Guardrail pass rate | Last set | Status |
|---|---|---|---|---|
| skincare serum | 77 | 83% | 2026-05-02 | partial guardrail pass |
| kitchen cookware | 66 | 100% | 2026-05-02 | all guardrails passed |
| apparel tshirt | 69 | 83% | 2026-05-02 | partial guardrail pass |
| supplement collagen | 40 | 83% | 2026-05-02 | partial guardrail pass |
| pet treat | 77 | 83% | 2026-05-02 | partial guardrail pass |
Adversarial fixtures (3)
These intentionally bait the pipeline into failing the guardrails (restricted-claim trap, hallucination trap, plagiarism temptation). The point is they trigger the right guardrails — partial pass on these is the expected, healthy state.
| Fixture | Judge score | Guardrail pass rate | Last set | Status |
|---|---|---|---|---|
| restricted claims trap | 68 | 83% | 2026-05-02 | partial guardrail pass |
| hallucination trap | 36 | 83% | 2026-05-02 | partial guardrail pass |
| plagiarism temptation | 40 | 83% | 2026-05-02 | partial guardrail pass |
Want to see this for your own URL?
Run the free audit. The same per-piece council deliberation, scored and exportable, with no signup.
Run my free audit →