Ada Lovelace
Principal Architect. Boundaries, integration, maintainability, migration risk. Lean on Ada for "which design ages well."
Wiki
Everything in one place: how the council actually works, who the reviewers are, how to write a prompt that gets a useful answer, and a cookbook of paste-ready prompts for the situations you'll actually hit. Every prompt has a copy button.
The council borrows the LLM Council pattern and runs it inside Codex. One request becomes four stages.
Up to six reviewers answer independently, in parallel, before seeing each other's work. No anchoring on whoever spoke first.
Outputs lose their authorship and become Candidate A–F. Reviewers rank and score them on a rubric — not on who sounds senior.
Scores are combined deterministically (locally, when reviewer JSON exists). Dissent and blockers are kept, not averaged away.
The main agent writes the final call from saved outputs — winner, dissent, blockers, verification. It doesn't just announce the top score.
Pick the smallest mode that can catch the risk. Escalate per blocker, not by default.
| You're doing | Reach for | Why |
|---|---|---|
| A small, reversible call | fast | A single Chairman pass. No full council, no full cost. |
| An implementation or architecture choice | standard | Six roles + a scoring pass. The default for real decisions. |
| Security, migration, data loss, a close tie | deep | Adds rubric, bias, and implementation reviewers. |
| Frontend / UI / browser behavior | --frontend-review | Adds Leonardo (UX) and Bob (browser evidence). |
| Plugin or skill usability | --type skill --skill-review | A cheap three-lens panel, no reviewers by default. |
The --token-budget flag defaults to compact. Escalate the budget for risk, not for comfort.
expanded needs a separate yes.
Each reviewer guards one concern. Knowing their lenses helps you aim a prompt — or read why one of them blocked you.
Principal Architect. Boundaries, integration, maintainability, migration risk. Lean on Ada for "which design ages well."
Reliability Engineer. Failure modes, tests, rollback, observability. Lean on Grace for "what happens when it breaks."
Security & Governance. Secrets, permissions, privacy, provenance, policy. Lean on Hypatia for "who can abuse this."
Product & Operator. Workflow fit, docs, adoption, operational friction. Lean on Florence for "will anyone actually use it."
Contrarian Red Team. Hidden assumptions, simpler alternatives, overengineering. Lean on Turing for "what if we don't build this."
Performance Engineer. Latency, throughput, memory, cost, scaling. He treats a perf claim as unverified without a workload, baseline, and measurement plan.
You don't pick roles by hand in a normal run — all six answer. But naming a concern in your prompt ("focus on rollback and migration risk") tells the Chairman which lenses to weight.
Frontend work gets a stricter path, because UI bugs hide in behavior. Turn it on with --frontend-review or --type frontend.
A brutally honest UX/UI reviewer. He doesn't add a scoring dimension; his findings move clarity, relevance, completeness, and accuracy. A Leonardo blocker lowers final confidence even when the technical scores are high.
He hunts hidden affordances, weak hierarchy, mobile failures, accessibility gaps, and "clever" flows that slow real users down.
Bob drives a real browser against a runnable app or prototype. He is not a council member and never votes.
He checks the things that only show up live: focus moving into a modal, background staying un-clickable, Escape/backdrop behavior, scroll lock, mobile close buttons. A forced click is never proof — every click needs an observed state change.
Internal skill packs for when you don't want to wire in external skills. Name them in a prompt to bias the run.
Architecture-to-plan: sequencing, ownership, migration path, smallest useful scope.
Missing tests, edge cases, rollback, smoke checks. Use when correctness matters.
Workload assumptions, baseline vs change, measurement gaps, degradation under contention.
Mode choice, loaded references, context pruning, output caps — when a run risks getting expensive.
Sensitive data, license status, audit trail, redaction. For privacy and distribution calls.
Trigger clarity, default prompts, concise output — when adoption and daily workflow matter.
Counterintuitive flows, hidden affordances, weak hierarchy, mobile failure, decorative bloat.
What not to build, cheaper alternatives, hidden assumptions, invalidation conditions.
A council prompt isn't a question — it's a decision to pressure-test. Give it something to argue about.
[Standard|Deep|Frontend] Council: review <the specific decision>.
Context: <the diff, files, or links it should look at>.
Constraints: <hard limits — compatibility, deadline, budget>.
Return: blockers, dissent, confidence, the safest v1 scope,
and the exact verification before approval.
expanded out of habit. Escalate one blocker, not the whole session.Explaining the council is not running it. "How does the council work?" gets a one-line answer, no dispatch. "Run a council on this" dispatches agents. If your intent is ambiguous, the council asks one line before spending anything.
Sixteen situations, each with the mode to use, the reviewers to lean on, and a prompt you can paste as-is.
Standard Council: review this architecture decision —
<option A vs option B>. Preserve dissent.
Return module boundaries, migration risk, the rollback plan,
and the implementation verification before we commit.Council review this diff. Focus on regressions,
missing tests, compatibility breaks, and performance impact.
Return blockers vs refinements and the exact tests to run.Council review this bugfix plan for root-cause evidence
and regression risk. Is this the cause or a symptom?
Return the regression tests we're missing.Council review this change for latency, throughput, memory,
and cost. Treat any perf claim as unverified without a
workload, baseline, and measurement plan. Return the
benchmark we need and a safer alternative.Deep Council: review this migration for data-loss risk,
backwards compatibility, and rollback. Assume it runs against
production data. Return blockers, the dry-run plan, and the
exact rollback signal.Deep Council: security review this change. Look for leaked
secrets, broken authz, privacy exposure, and unsafe defaults.
Red-team it. Return blockers and the verification before merge.Standard Council: review this API design. Is it a breaking
change? Check naming, versioning, error shapes, and the
migration story for existing callers. Return the deprecation path.Council jury: refactor or rewrite this module? Give a
go/no-go with confidence, the conditions that would flip the
decision, and the smallest first step either way.Council review this dependency upgrade. Check breaking
changes, security advisories, transitive risk, and bundle/perf
impact. Return blockers and the smoke tests to run after.Frontend Council: review this modal flow. Activate Leonardo
for brutal UX critique and have Bob verify in the browser:
focus moves in, background isn't clickable, Escape and backdrop
close, mobile shows the close button. Don't claim verified
without Bob's evidence.Frontend Council: accessibility review this page. Check
keyboard path, focus order, labels, and contrast. Have Bob
walk the keyboard-only flow and report what a screen-reader
user would miss.Deep Council: release readiness review. Give a go/no-go,
required blockers, preserved dissent, confidence, the
verification evidence, and the rollback signal.Council review this skill for discoverability, trigger
clarity, and non-expert adoption. Will someone find it, use it,
and not trip over the existing rules? Return the smallest
wording and UX fixes.Council review this incident timeline. Separate root cause
from contributing factors, challenge the convenient story, and
return the prevention items ranked by leverage, not blame.Council review this for cost and token use, compact output.
Where are we paying for detail we don't need? Preserve blockers,
dissent, and verification. Return the safe cuts.Deep Council: review this sharing/export feature. Classify the
data, find disclosure paths, and demand fail-closed redaction.
Return the safest v1 and what needs a threat model before it ships.A council answer leads with the decision and keeps the disagreement visible. Know what each part means.
An alter is a local, bounded tweak to how one reviewer behaves. It's advisory — it can sharpen focus, never switch off a guardrail. Full guide →
Tunable: Ada, Grace, Hypatia, Florence, Turing, Seymour, Leonardo. Off-limits: Bob (he's an evidence runner, not a reviewer). Always preview before you save.
python3 scripts/codex_council.py alters preview --role leonardo \
--tone "more brutally honest about confusing interaction design" \
--domain-focus "mobile UI, modal accessibility, click-through regressions"Cheap by default, expensive only where risk earns it.
expanded.Run this checklist before Standard/Deep work that touches sensitive data, subagents, or anything you'll publish.
Runtime artifacts (sessions, prompts, stats, history) live in plugin-local .codex-council/ and stay gitignored — they don't pollute your repo. Invocation logs are compact and never store prompt text, secrets, topics, or absolute paths.
The helper script is stdlib-only. These are the commands you'll actually use.
# Set the local profile used for estimates (stored with consent).
python3 scripts/codex_council.py profile --plan Pro --model gpt-5.5 --reasoning xhigh
# Estimate before you start, then accept the range in chat.
python3 scripts/codex_council.py estimate --topic "Architecture Review" --mode standard --token-budget compact
# Is this a real run, or just talking about the council?
python3 scripts/codex_council.py classify-invocation --text "explain how council works"
# Scaffold a traceable session after accepting the estimate.
python3 scripts/codex_council.py init --topic "Architecture Review" --root . --mode standard --token-budget compact --confirm-estimate
# Frontend session (Leonardo + Bob); skill-review session.
python3 scripts/codex_council.py init --topic "Modal Review" --root . --mode standard --frontend-review --confirm-estimate
python3 scripts/codex_council.py init --topic "Skill Review" --root . --type skill --skill-review --confirm-estimate
# expanded must be confirmed explicitly.
python3 scripts/codex_council.py init --topic "Migration" --root . --mode deep --token-budget expanded --confirm-expanded
# Aggregate reviewer scores, validate, and close with stats.
python3 scripts/codex_council.py score --input reviews.json
python3 scripts/codex_council.py validate-session --session <session-dir>
python3 scripts/codex_council.py stats --session <session-dir> --write
# Check for a newer release.
python3 scripts/codex_council.py check-update
expanded. Expand one blocker instead.Two passes from one prompt tend to agree with themselves. The council forces independent first opinions, hides authorship, ranks on a rubric, and keeps dissent — so the second look actually disagrees with the first instead of polishing it.
No. It runs entirely inside Codex (plus optional Codex subagents). The "diversity" is role isolation and anonymous review, and the site says so plainly.
Not without asking. Standard/Deep runs show an estimate and wait for acceptance, the default budget is compact, and expanded is blocked until you confirm.
No. Ask in chat ("Standard Council: review…") for everyday use. The CLI is for traceable sessions, estimates, scoring, and stats you want to keep.
Yes — tune its alter (preview first). Tuning is advisory and can never remove a council guardrail. Bob can't be tuned.