Vela

frontiers / frontier

AI-for-science benchmark state

constellation seal · derived from vfr_efc649fd772a1ff1
id
vfr_efc649fd772a1ff1
license
CC-BY-4.0
findings
12
accepted core
12
contested
0
links
0
sources
1
evidence
12
avg conf
0.30

used by 0 · replayed by 1 producer · second seat open

e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f

Reviewable change

back to review

add a finding

verified — A frozen deterministic verifier re-checked the claim and passed.accepted

BENCHMARK CLAIM (MiniF2F) — DeepSeek-Prover-V1.5 REPORTS a leading miniF2F-test pass rate under a large sampling budget (RMaxTS). VERIFICATION STATE: author-reported; model weights public; eval harness in the paper; dataset version = the team's stated split. NOT independently re-run in this frontier. Open obligation: pin the split, re-run the released checkpoint, audit train/test contamination of the formal statements.

id
vpr_f3a3a73919f9eb51
frontier
AI-for-science benchmark state
kind
finding.add
created
2026-06-10
findings
+1
state
null → b3817600

accept gate

2 of 4 on record
signature
reviewer:will-blair · key 4892f938
chain
null → b3817600
witness
no verifier attachment on record for this target
grade
in state · unreviewed

timeline

  1. 2026-06-10proposeproposed · finding.addreviewer:will-blairreviewer:will-blairvpr_f3a3a73919f9eb51Manual finding added to frontier state
  2. 2026-06-10acceptfinding.assertedreviewer:will-blairreviewer:will-blairnullb3817600vev_b396d3a2727ae019Manual finding added to frontier state

proposed

reason

Manual finding added to frontier state

finding type

computational

proposed confidence

0.30

confidence basis

operator-supplied frontier prior; review required

provenance

proposed by

reviewer:will-blairreviewer:will-blair

actor type

human

created at

2026-06-10

target type

finding

BENCHMARK CLAIM (MiniF2F) — DeepSeek-Prover-V1.5 REPORTS a leading miniF2F-test pass rate under a large sampling budget (RMaxTS). VERIFICATION STATE: author-reported; model weights public; eval harness in the paper; dataset version = the team's stated split. NOT independently re-run in this frontier. Open obligation: pin the split, re-run the released checkpoint, audit train/test contamination of the formal statements.

vf_55068262f49df0ab

Diff

Read-only frontier; diff not recomputed.

Review chain

  1. 01request

    Change request

    AI-for-science benchmark state receives a reviewable source, finding, caveat, replication, evaluation, or proof-affecting edit.

    open review
  2. 02packet

    Diff packet

    The packet names affected record objects, evidence, rationale, reviewer-facing fields, and expected proof impact.

    open the campaign
  3. 03checks

    Check output

    Schema, provenance, benchmark, contradiction, and proof checks decide whether the request is ready to read.

    inspect checks
  4. 04review

    Reviewer decision

    A steward accepts, rejects, caveats, revises, or retracts the request under an inspectable identity.

    read queue
  5. 05accepted

    Accepted event

    Only the accepted event mutates frontier state. Atlases, constellations, and search update from that record state.

    inspect events

finding.noted · reviewer:will-blair · 1 day

renders the record as of vev_d199cb2e · 1,338 events · hub

Search Vela

Jump to a section, signal, campaign, document, primitive, work path, frontier, record index, atlas, constellation, agent, capability, or full-state search.