evidence boundary
unknownfrontiers / frontier
AI-for-science benchmark state
- id
- vfr_efc649fd772a1ff1
- license
- CC-BY-4.0
- findings
- 12
- accepted core
- 12
- contested
- 0
- links
- 0
- sources
- 1
- evidence
- 12
- avg conf
- 0.30
e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f
Evidence atom
back to sourcesBENCHMARK CLAIM (MiniF2F) — HyperTree Proof Search (HTPS, Lample et al.) REPORTS a miniF2F pass rate via learned best-first proof search. VERIFICATION STATE: author-reported; search budget and version-specific. NOT re-run here. Open obligation: re-run at the stated budget on a pinned split.
- id
- vea_4ac62ba55a4b8dbc
- frontier
- AI-for-science benchmark state
- source
- vs_066123dd29a9c5b4
- finding
- vf_9a454a597ddee070
finding binding
boundcomputational
BENCHMARK CLAIM (MiniF2F) — HyperTree Proof Search (HTPS, Lample et al.) REPORTS a miniF2F pass rate via learned best-first proof search. VERIFICATION STATE: author-reported; search budget and version-specific. NOT re-run here. Open obligation: re-run at the stated budget on a pinned split.
source binding
source-boundmanual finding
vs_066123dd29a9c5b4
review context
unverified2 events
2 reviewable changes and 0 evaluation records target this atom or its bound objects.
statement
BENCHMARK CLAIM (MiniF2F) — HyperTree Proof Search (HTPS, Lample et al.) REPORTS a miniF2F pass rate via learned best-first proof search. VERIFICATION STATE: author-reported; search budget and version-specific. NOT re-run here. Open obligation: re-run at the stated budget on a pinned split.
extraction method
manual_curation
support relation
unknown
condition refs
vcnd_f925e73d7e12806e
caveats
- missing evidence locator
Review, event, and evaluation records
4events
vev_03b2b7f5e7e0be96finding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-06-10
vev_4e2e2a5f25a8e28ffinding.notedHARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.
reviewer:will-blair · 2026-06-10
reviewable changes
vpr_66758152772dd461finding.noteHARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.
applied · agent:hardening-2026-06-10 · 2026-06-10
vpr_8ebb01be4aedad3bfinding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-06-10
evaluations
No evaluation rows are attached.