Prompt comparison

Same problem, more pressure.

I ran this for real while building the site. The brief was deliberately messy: shareable session reports with automatic redaction, HTML and JSON export, and team handoff links. A plain prompt gave me a tidy outline. The council gave me a safer plan.

Normal Codex prompt

Evaluate proposal:
Add shareable Council reports
with automatic redaction,
HTML/JSON export, and
team handoff links.

What the single pass caught:

Build a local report assembler on top of existing session artifacts.
Run a redaction pass before HTML/JSON export.
Keep handoff local until there's authenticated hosting.
Add tests for secrets, paths, the report schema, and the generated HTML.

All useful, all roughly right. But it still treated the whole thing as an export feature.

Codex Council prompt

Standard Council:
review the same proposal.
Use architecture, reliability,
privacy, product, red-team,
and performance lenses.
Return blockers and
the safest v1 scope.

What the council added:

Ship local, deterministic exports in v1 — nothing else.
Don't sell automatic redaction as a security boundary.
Allowlist a public schema instead of scrubbing raw logs.
Fail closed if redaction or a sensitive-content check breaks.
No recursive walks through session folders, no dumping huge raw output.
Add a lifecycle: preview, dry-run, delete, re-export, retention.
Add performance gates at 1 MB, 25 MB, and 100 MB.

The actual output

Captured straight from the run.

Condensed, but real. The token numbers are estimates, not a bill — and remember this is still one model playing several parts, not a panel of rival providers.

Standard Council preflight

$ python3 scripts/codex_council.py estimate \
  --topic "Add shareable Codex Council session reports..." \
  --mode standard --token-budget compact --type implementation

Mode: standard
Type: implementation
Roles/reviewers/evidence runners: 6/2/0
Estimated total tokens: 12220 (range 9165..16497)
Estimated input/output tokens: 10830/1390

What plain Codex returned

Recommendation:
Build it, but scope v1 as local explicit export.

Architecture:
session artifacts -> ReportAssembler -> RedactionPipeline
-> HTML/JSON exporters -> optional handoff artifact

Risks:
accidental disclosure, telemetry confusion, collaboration
hosting scope creep.

What the council returned

Final recommendation:
Ship local deterministic session exports only.
Use HTML and JSON generated from an allowlisted public schema.
Run strict redaction before rendering or serialization.
Do not ship automatic publishing, public URLs, or
security-boundary claims yet.

Blockers it surfaced

- No raw prompts, outputs, logs, or arbitrary session fields by default.
- Redaction must be field-level and fail closed.
- Report schema must be allowlisted and versioned.
- Partial/legacy sessions must degrade safely, not dump raw data.
- Export must be manifest-driven with no recursive scans.
- Preview, dry-run, delete, re-export, and retention rules are required.
- Team handoff needs a threat model before hosted or public links.
- Performance gates are required because preview/re-export can double work.

The honest read

The single pass had the right shape.
The council made it safer: smaller v1, no premature
sharing claims, and privacy, lifecycle, and performance
constraints that would otherwise have shipped as
production risks.

The takeaway

It's worth it when it changes the answer.

Plain Codex found a workable architecture. The council turned it into a safer release.
Plain Codex said "local export first". The council said "local export only — until sharing has a threat model".
Plain Codex added redaction tests. The council required fail-closed, field-level redaction before anything renders.
Plain Codex listed risks. The council turned them into blockers, a lifecycle, and performance gates.