Vela

frontiers / frontier

AI-for-science benchmark state

constellation seal · derived from vfr_efc649fd772a1ff1
id
vfr_efc649fd772a1ff1
license
CC-BY-4.0
findings
12
accepted core
12
contested
0
links
0
sources
1
evidence
12
avg conf
0.30

used by 0 · replayed by 1 producer · second seat open

e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f

Evidence atom

back to sources

BENCHMARK META (MiniF2F). MiniF2F is ~488 olympiad/textbook formal-math problems (AMC/AIME/IMO + MATH), ported to Lean/Isabelle/HOL-Light/Metamath, split valid/test. KNOWN TRUST ISSUE: multiple incompatible versions exist (original 2021, miniF2F-v2, and the 'miniF2F Revisited' cleanup with corrected/changed statements), so pass-rates across papers are version-ambiguous unless the exact split is pinned. STATE: dataset-version hazard, not a model claim.

id
vea_107364fe31419d2d
frontier
AI-for-science benchmark state
source
vs_066123dd29a9c5b4
finding
vf_cf89ac0f36e62089

evidence boundary

unknown

computational

finding binding

bound

computational

BENCHMARK META (MiniF2F). MiniF2F is ~488 olympiad/textbook formal-math problems (AMC/AIME/IMO + MATH), ported to Lean/Isabelle/HOL-Light/Metamath, split valid/test. KNOWN TRUST ISSUE: multiple incompatible versions exist (original 2021, miniF2F-v2, and the 'miniF2F Revisited' cleanup with corrected/changed statements), so pass-rates across papers are version-ambiguous unless the exact split is pinned. STATE: dataset-version hazard, not a model claim.

source binding

source-bound

manual finding

vs_066123dd29a9c5b4

review context

unverified

2 events

2 reviewable changes and 0 evaluation records target this atom or its bound objects.

statement

BENCHMARK META (MiniF2F). MiniF2F is ~488 olympiad/textbook formal-math problems (AMC/AIME/IMO + MATH), ported to Lean/Isabelle/HOL-Light/Metamath, split valid/test. KNOWN TRUST ISSUE: multiple incompatible versions exist (original 2021, miniF2F-v2, and the 'miniF2F Revisited' cleanup with corrected/changed statements), so pass-rates across papers are version-ambiguous unless the exact split is pinned. STATE: dataset-version hazard, not a model claim.

extraction method

manual_curation

support relation

unknown

condition refs

vcnd_ac78fb246103bc8c

caveats

  • missing evidence locator

Review, event, and evaluation records

4

events

  • vev_5a33eaff97407ac8finding.asserted

    Manual finding added to frontier state

    reviewer:will-blairreviewer:will-blair · 2026-06-10

  • vev_f032a45ff0886024finding.noted

    HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.

    reviewer:will-blairreviewer:will-blair · 2026-06-10

reviewable changes

  • vpr_2ad76a3dce783d96finding.note

    HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.

    agent — machine actor, no signing keyapplied · agent:hardening-2026-06-10 · 2026-06-10

  • vpr_ce433e03bf245f79finding.add

    Manual finding added to frontier state

    reviewer:will-blairapplied · reviewer:will-blair · 2026-06-10

evaluations

No evaluation rows are attached.

statement.registered · agent:claude-proxy · 4 days

renders the record as of vev_e73c9b6c · 1,355 events · hub

Search Vela

Jump to a section, signal, campaign, document, primitive, work path, frontier, record index, atlas, constellation, agent, capability, or full-state search.