Limits

It sharpens judgment. It doesn't replace verification.

I built it to make risk visible, not to manufacture confidence. The final answer still needs tests, a real browser, and your own read of the situation.

What it won't fake

The honesty is part of the design.

No fake provider diversity. This is Codex playing several roles, not several independent models.
No unverified UI claims. A UI isn't "verified" unless Bob, or a real browser, actually ran the path.
No real billing data. Token reports are local estimates, not your actual Codex usage or remaining quota.
No silent expanded runs. Expanded can burn a lot of usage, so it won't start without your confirmation.
No mess in your repo. Sessions, prompts, stats, and history live in plugin-local state and stay gitignored.
No runtime shortcuts. The shadow cannot change the legacy verdict, and saving targets remain unmeasured until replay proves them.

Privacy and state

Local artifacts are still artifacts.

The helper script keeps session scaffolds, estimates, prompts, outputs, stats, history, and alter overrides in plugin-local state. Before you publish a repo, make sure .codex-council/ is gitignored and uncommitted.

# Check for local runtime artifacts before publishing.
git status --short
find . -name '.codex-council' -o -name '.DS_Store' -o -name '__pycache__'

Decision Runtime safety

The shadow may fail. The legacy path must not.

Opt-in

Nothing is projected until you ask for the extra view. It stays local and off by default.

Legacy-safe

Unknown or unhealthy shadow state is ignored; the original session and verdict remain usable.

Still sensitive

Shadow data is local state, not encrypted proof. Treat it with the same care as the source session.

See the Decision Runtime overview

Where it comes from

Built on the LLM Council idea, adapted for Codex.

The pattern comes from karpathy/llm-council and llm-council.dev: ask several independent models, anonymize the answers, rank them, then synthesize a final one. Codex Council keeps that shape but runs it through Codex roles, local scoring, token estimates, and optional browser evidence.