AI-for-science benchmark state

id: vfr_efc649fd772a1ff1
license: CC-BY-4.0
findings: 12
accepted core: 12
contested: 0
links: 0
sources: 1
evidence: 12
avg conf: 0.30

used by 0 · replayed by 1 producer · second seat open

e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f

Reviewable change

add a finding

accepted

BENCHMARK META (ProteinGym). ProteinGym benchmarks variant-effect prediction against deep mutational scanning (DMS) assays: a substitution benchmark (~217 assays) and an indel benchmark, with zero-shot and supervised tracks, scored by Spearman correlation (and AUC/MCC). KNOWN TRUST ISSUE: v1.0 vs v1.1 differ in assay set and splits; zero-shot vs supervised numbers are not comparable; MSA-dependent methods vary with the MSA pipeline. STATE: dataset-version + track-conflation hazard.

id: vpr_d3f3228bb463c2d9
frontier: AI-for-science benchmark state
kind: finding.add
created: 2026-06-10
findings: +1
state: null → bc813b05

accept gate

2 of 4 on record

✓
signature: reviewer:will-blair · key 4892f938
✓
chain: null → bc813b05
—
witness: no verifier attachment on record for this target
—
grade: in state · unreviewed

timeline

2026-06-10proposeproposed · finding.addreviewer:will-blairvpr_d3f3228bb463c2d9Manual finding added to frontier state
2026-06-10acceptfinding.assertedreviewer:will-blairnull→bc813b05vev_f17f5a864754e2a0Manual finding added to frontier state

proposed

reason

Manual finding added to frontier state

finding type

computational

proposed confidence

0.30

confidence basis

operator-supplied frontier prior; review required

provenance

proposed by

reviewer:will-blair

actor type

human

created at

2026-06-10

target type

finding

affected

inspect finding →

vf_ec4bb8feca206bf2

Diff

Read-only frontier; diff not recomputed.

Review chain

01request
Change request
AI-for-science benchmark state receives a reviewable source, finding, caveat, replication, evaluation, or proof-affecting edit.
open review →
02packet
Diff packet
The packet names affected record objects, evidence, rationale, reviewer-facing fields, and expected proof impact.
open the campaign →
03checks
Check output
Schema, provenance, benchmark, contradiction, and proof checks decide whether the request is ready to read.
inspect checks →
04review
Reviewer decision
A steward accepts, rejects, caveats, revises, or retracts the request under an inspectable identity.
read queue →
05accepted
Accepted event
Only the accepted event mutates frontier state. Atlases, constellations, and search update from that record state.
inspect events →

Change request

Diff packet

Check output

Reviewer decision

Accepted event

Search Vela