Vela

frontiers / frontier

AI-for-science benchmark state

constellation seal · derived from vfr_efc649fd772a1ff1
id
vfr_efc649fd772a1ff1
license
CC-BY-4.0
findings
12
accepted core
12
contested
0
links
0
sources
1
evidence
12
avg conf
0.30

used by 0 · replayed by 1 producer · second seat open

e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f

Finding bundle

back to state

BENCHMARK CLAIM (MiniF2F) — DeepSeek-Prover-V1.5 REPORTS a leading miniF2F-test pass rate under a large sampling budget (RMaxTS). VERIFICATION STATE: author-reported; model weights public; eval harness in the paper; dataset version = the team's stated split. NOT independently re-run in this frontier. Open obligation: pin the split, re-run the released checkpoint, audit train/test contamination of the formal statements.

id
vf_55068262f49df0ab
frontier
AI-for-science benchmark state
version
1
confidence
0.30

no incoming links yet

file

/frontier/benchmark-state#e=17scrub position · after_hash afc19db3f0499f21…
vf_55068262f49df0ab · benchmark-state · https://vela-site-next.fly.dev/frontier/benchmark-state#e=17cite
raw json · vf_55068262f49df0ab (2.7 KB)
{
 "annotations": [
  {
   "author": "reviewer:will-blair",
   "id": "ann_a90a9a26863b2dc8",
   "text": "HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.",
   "timestamp": "2026-06-10T23:01:45.084566+00:00"
  }
 ],
 "assertion": {
  "direction": null,
  "entities": [],
  "relation": null,
  "text": "BENCHMARK CLAIM (MiniF2F) — DeepSeek-Prover-V1.5 REPORTS a leading miniF2F-test pass rate under a large sampling budget (RMaxTS). VERIFICATION STATE: author-reported; model weights public; eval harness in the paper; dataset version = the team's stated split. NOT independently re-run in this frontier. Open obligation: pin the split, re-run the released checkpoint, audit train/test contamination of the formal statements.",
  "type": "computational"
 },
 "conditions": {
  "age_group": null,
  "cell_type": null,
  "clinical_trial": false,
  "concentration_range": null,
  "duration": null,
  "human_data": false,
  "in_vitro": false,
  "in_vivo": false,
  "species_unverified": [],
  "species_verified": [],
  "text": "Manually added finding; requires evidence review before scientific use."
 },
 "confidence": {
  "basis": "operator-supplied frontier prior; review required",
  "extraction_confidence": 1,
  "kind": "frontier_epistemic",
  "method": "expert_judgment",
  "score": 0.3
 },
 "created": "2026-06-10T06:50:55.829210+00:00",
 "evidence": {
  "effect_size": null,
  "evidence_spans": [],
  "method": "manual state transition",
  "model_system": "",
  "p_value": null,
  "replicated": false,
  "replication_count": null,
  "sample_size": null,
  "species": null,
  "type": "computational"
 },
 "flags": {
  "contested": false,
  "declining": false,
  "gap": true,
  "gravity_well": false,
  "negative_space": false,
  "retracted": false
 },
 "id": "vf_55068262f49df0ab",
 "links": [],
 "previous_version": null,
 "provenance": {
  "authors": [
   {
    "name": "reviewer:will-blair",
    "orcid": null
   }
  ],
  "citation_count": null,
  "doi": null,
  "extraction": {
   "extracted_at": "2026-06-10T06:50:55.829198+00:00",
   "extractor_version": "vela/0.691.0",
   "method": "manual_curation",
   "model": null,
   "model_version": null
  },
  "journal": null,
  "openalex_id": null,
  "pmc": null,
  "pmid": null,
  "review": {
   "corrections": [],
   "reviewed": false,
   "reviewed_at": null,
   "reviewer": null
  },
  "source_type": "expert_assertion",
  "title": "manual finding",
  "year": null
 },
 "updated": null,
 "version": 1
}

Unsealed — 0 attachment(s) on record, awaiting independent verification.

0 attachments · 0 distinct checker actors · 0 methods

blame · custody trail

produced byreviewer:will-blairreviewer:will-blairfinding.asserted · 2026-06-10vev_b396d3a2727ae019
checked byno verifier attachment on record
accepted byno accept signed

history · 2 events

record state

frontier-owned

Review status

claimed — no verifier run, no signed judgmentunreviewed

finding statement

finding type

computational

No entity list is declared.

evidence

source-bound

1 atoms

computational · manual state transition

proof impact

packet context

2 events

2 reviewable changes and 0 evaluation records are attached to this finding id.

evidence

method

manual state transition

evidence type

computational

conditions

species_unverified
species_verified
text
Manually added finding; requires evidence review before scientific use.

provenance

source title

manual finding

authors

reviewer:will-blair

Source records

1

Evidence atoms

1
  • vea_2ac0e43858a68cb9computational · unknown

    BENCHMARK CLAIM (MiniF2F) — DeepSeek-Prover-V1.5 REPORTS a leading miniF2F-test pass rate under a large sampling budget (RMaxTS). VERIFICATION STATE: author-reported; model weights public; eval harness in the paper; dataset version = the team's stated split. NOT independently re-run in this frontier. Open obligation: pin the split, re-run the released checkpoint, audit train/test contamination of the formal statements.

    vs_066123dd29a9c5b4 · manual_curation

Typed links

0

outgoing

No outgoing links.

incoming

No incoming links.

Review, event, and evaluation records

4

events

  • vev_804497a5a8fbe4a0finding.noted

    HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.

    reviewer:will-blairreviewer:will-blair · 2026-06-10

  • vev_b396d3a2727ae019finding.asserted

    Manual finding added to frontier state

    reviewer:will-blairreviewer:will-blair · 2026-06-10

reviewable changes

  • vpr_f3a3a73919f9eb51finding.add

    Manual finding added to frontier state

    reviewer:will-blairapplied · reviewer:will-blair · 2026-06-10

  • vpr_fb5a71c197133639finding.note

    HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.

    agent — machine actor, no signing keyapplied · agent:hardening-2026-06-10 · 2026-06-10

evaluations

No evaluation record targets this finding id.

statement.registered · agent:claude-proxy · 4 days

renders the record as of vev_e73c9b6c · 1,355 events · hub

Search Vela

Jump to a section, signal, campaign, document, primitive, work path, frontier, record index, atlas, constellation, agent, capability, or full-state search.