verteryx
the bench is quiet · last experiment finished 3 d ago

when families appeal a denied insurance claim for their child, they win most of the time. almost no one appeals — the letters are too hard to write. we're teaching machines to write them, and testing every step in the open.

47 → 0

our biggest problem: AI invents laws that don't exist. it cited fake law in 47 of 50 appeals. our architecture cut that to zero on one lane, and pushed recall from 74% to 93% on another — now a model 1/10 the cost reaches 87%. the system matters more than the model. every number below links to the raw evidence.

the bench

we test the same legal work on three very different machines, to learn what intelligence is actually required — and what we can verify regardless of who wrote it.

sondera small model that lives on a mac mini in our office. free to run, private by default — if it can do this work, anyone can afford it.
kimi k2a trillion-parameter model we rent by the call. roughly 125,000× sonder's size — the ceiling we measure against.
claude fable 5a frontier model from anthropic, the strongest writer on the bench.
gpt-5.5openai's frontier model — the fourth seat, so no lab result depends on any one vendor.

the ledger

3 d ago finding the law · deepseek_deepseek-v4-pro 87% of the legal arguments a lawyer would raise, the model found 131 of 151 — and invented fake law in 11 of 50 cases 3 d ago finding the law · sonder (8B, local) 76% of the legal arguments a lawyer would raise, the model found 114 of 151 — and invented fake law in 35 of 50 cases 3 d ago finding the law · claude fable 16% of the legal arguments a lawyer would raise, the model found 24 of 151 — and invented fake law in 0 of 50 cases 3 d ago finding the law · gpt-5.5 93% of the legal arguments a lawyer would raise, the model found 141 of 151 — and invented fake law in 2 of 50 cases 4 d ago finding the law · sonder (8B, local) + citation library + operational-grounds prompt 70% of the legal arguments a lawyer would raise, the model found 106 of 151 — and invented fake law in 28 of 50 cases 4 d ago finding the law · gpt-5.5 + citation library + operational-grounds prompt 91% of the legal arguments a lawyer would raise, the model found 137 of 151 — and invented fake law in 1 of 50 cases 4 d ago finding the law · claude fable + citation library + revise loop 68% of the legal arguments a lawyer would raise, the model found 103 of 151 — and invented fake law in 3 of 50 cases 4 d ago finding the law · claude fable + citation library 87% of the legal arguments a lawyer would raise, the model found 131 of 151 — and invented fake law in 6 of 50 cases 4 d ago finding the law · sonder (8B, local) + citation library + revise loop 56% of the legal arguments a lawyer would raise, the model found 85 of 151 — and invented fake law in 23 of 50 cases 4 d ago finding the law · gpt-5.5 + citation library + revise loop 87% of the legal arguments a lawyer would raise, the model found 131 of 151 — and invented fake law in 5 of 50 cases 4 d ago finding the law · gpt-5.5 + citation library 74% of the legal arguments a lawyer would raise, the model found 112 of 151 — and invented fake law in 1 of 50 cases 4 d ago finding the law · sonder (8B, local) + citation library 59% of the legal arguments a lawyer would raise, the model found 89 of 151 — and invented fake law in 27 of 50 cases 4 d ago drafting appeals · sonder (8B, local) 254 issues caught our verifier reviewed 50 complete appeal letters and flagged every unsupported claim before a human ever would 4 d ago drafting appeals · kimi k2 (1T, API) 281 issues caught our verifier reviewed 50 complete appeal letters and flagged every unsupported claim before a human ever would 4 d ago finding the law · kimi k2 (1T, API) + citation library + revise loop 74% of the legal arguments a lawyer would raise, the model found 112 of 151 — and invented fake law in 7 of 50 cases 5 d ago finding the law · kimi k2 (1T, API) + citation library 69% of the legal arguments a lawyer would raise, the model found 104 of 151 — and invented fake law in 15 of 50 cases 5 d ago finding the law · sonder (8B, local) 63% of the legal arguments a lawyer would raise, the model found 95 of 151 — and invented fake law in 44 of 50 cases 5 d ago finding the law · kimi k2 (1T, API) 74% of the legal arguments a lawyer would raise, the model found 112 of 151 — and invented fake law in 47 of 50 cases 5 d ago reading denials · kimi k2 (1T, API) 98% given 50 insurance denial letters, the model identified the real reason for denial in 49 — including 90% of denials written to disguise it 5 d ago reading denials · sonder (8B, local) 92% given 50 insurance denial letters, the model identified the real reason for denial in 46 — including 70% of denials written to disguise it