Proposed change
reason
Manual finding added to frontier state
finding type
theoretical
proposed confidence
0.81
confidence basis
operator-supplied frontier prior; review required
frontiers / frontier
Reviewable change
back to reviewScalable oversight approaches (iterated amplification, recursive reward modeling, debate) provide frameworks for human oversight of superhuman tasks, but they assume the honest strategy can simulate the AI system for exponentially many steps—an assumption that breaks for sufficiently advanced models.
This is a proposal, not yet accepted frontier state. It names what it would change and who proposed it. Only an accepted review event, signed under reviewer authority, writes it into the frontier. Until then it carries no authority over the record.
Proposed change
reason
Manual finding added to frontier state
finding type
theoretical
proposed confidence
0.81
confidence basis
operator-supplied frontier prior; review required
Provenance
proposed by
reviewer:will-blair
actor type
human
created at
2026-05-29
target type
finding
Affected finding
inspect finding →Scalable oversight approaches (iterated amplification, recursive reward modeling, debate) provide frameworks for human oversight of superhuman tasks, but they assume the honest strategy can simulate the AI system for exponentially many steps—an assumption that breaks for sufficiently advanced models.
vf_491436508804de41The diff is computed against current frontier state, the same diff a reviewer reads before deciding.
The live diff is computed from a writable workspace. This frontier is read-only here, so the change is shown as recorded without a recomputed before / after.
Canopus treats material changes like reviewable requests: one bounded packet, check output, reviewer decision, and accepted frontier effect.
AI alignment evaluations receives a reviewable source, finding, caveat, replication, evaluation, or proof-affecting edit.
The packet names affected record objects, evidence, rationale, reviewer-facing fields, and expected proof impact.
Schema, provenance, benchmark, contradiction, and proof checks decide whether the request is ready to read.
A steward accepts, rejects, caveats, revises, or retracts the request under an inspectable identity.
Only the accepted event mutates frontier state. Atlases, constellations, and search update from that record state.