proposed
reason
Manual finding added to frontier state
finding type
computational
proposed confidence
0.30
confidence basis
operator-supplied frontier prior; review required
frontiers / frontier
e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f
Reviewable change
back to reviewBENCHMARK META (MiniF2F). MiniF2F is ~488 olympiad/textbook formal-math problems (AMC/AIME/IMO + MATH), ported to Lean/Isabelle/HOL-Light/Metamath, split valid/test. KNOWN TRUST ISSUE: multiple incompatible versions exist (original 2021, miniF2F-v2, and the 'miniF2F Revisited' cleanup with corrected/changed statements), so pass-rates across papers are version-ambiguous unless the exact split is pinned. STATE: dataset-version hazard, not a model claim.
accept gate
2 of 4 on recordtimeline
vpr_ce433e03bf245f79Manual finding added to frontier statenull→7bfbae3avev_5a33eaff97407ac8Manual finding added to frontier stateproposed
reason
Manual finding added to frontier state
finding type
computational
proposed confidence
0.30
confidence basis
operator-supplied frontier prior; review required
provenance
proposed by
reviewer:will-blair
actor type
human
created at
2026-06-10
target type
finding
affected
inspect finding →BENCHMARK META (MiniF2F). MiniF2F is ~488 olympiad/textbook formal-math problems (AMC/AIME/IMO + MATH), ported to Lean/Isabelle/HOL-Light/Metamath, split valid/test. KNOWN TRUST ISSUE: multiple incompatible versions exist (original 2021, miniF2F-v2, and the 'miniF2F Revisited' cleanup with corrected/changed statements), so pass-rates across papers are version-ambiguous unless the exact split is pinned. STATE: dataset-version hazard, not a model claim.
vf_cf89ac0f36e62089Read-only frontier; diff not recomputed.
AI-for-science benchmark state receives a reviewable source, finding, caveat, replication, evaluation, or proof-affecting edit.
The packet names affected record objects, evidence, rationale, reviewer-facing fields, and expected proof impact.
Schema, provenance, benchmark, contradiction, and proof checks decide whether the request is ready to read.
A steward accepts, rejects, caveats, revises, or retracts the request under an inspectable identity.
Only the accepted event mutates frontier state. Atlases, constellations, and search update from that record state.
Jump to a section, signal, campaign, document, primitive, work path, frontier, record index, atlas, constellation, agent, capability, or full-state search.