frontiers / frontier

AI-for-science benchmark state

id: vfr_efc649fd772a1ff1
license: CC-BY-4.0
findings: 12
accepted core: 12
contested: 0
links: 0
sources: 1
evidence: 12
avg conf: 0.30

used by 0 · replayed by 1 producer · second seat open

e24/24 · finding.noted · reviewer:will-blair · 2026-06-10 · 6c12→d02f

Finding bundle

back to state

BENCHMARK CLAIM (ProteinGym) — ProteinNPT (non-parametric transformer, supervised track) REPORTS gains by attending across labelled neighbours. VERIFICATION STATE: author-reported; SUPERVISED — not comparable to zero-shot numbers; depends on the cross-validation split. NOT re-run here. Open obligation: re-run under the official supervised CV split; never compare against zero-shot rows.

id: vf_41030d44f59eae22
frontier: AI-for-science benchmark state
version: 1
confidence: 0.30

no incoming links yet

file

/frontier/benchmark-state#e=16scrub position · after_hash dd9be8a780d7ab98…

vf_41030d44f59eae22 · benchmark-state · https://vela-site-next.fly.dev/frontier/benchmark-state#e=16cite

raw json · vf_41030d44f59eae22 (2.6 KB)

machine twin · index.json

{
 "annotations": [
  {
   "author": "reviewer:will-blair",
   "id": "ann_ab3ecb7585481522",
   "text": "HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.",
   "timestamp": "2026-06-10T23:01:45.061196+00:00"
  }
 ],
 "assertion": {
  "direction": null,
  "entities": [],
  "relation": null,
  "text": "BENCHMARK CLAIM (ProteinGym) — ProteinNPT (non-parametric transformer, supervised track) REPORTS gains by attending across labelled neighbours. VERIFICATION STATE: author-reported; SUPERVISED — not comparable to zero-shot numbers; depends on the cross-validation split. NOT re-run here. Open obligation: re-run under the official supervised CV split; never compare against zero-shot rows.",
  "type": "computational"
 },
 "conditions": {
  "age_group": null,
  "cell_type": null,
  "clinical_trial": false,
  "concentration_range": null,
  "duration": null,
  "human_data": false,
  "in_vitro": false,
  "in_vivo": false,
  "species_unverified": [],
  "species_verified": [],
  "text": "Manually added finding; requires evidence review before scientific use."
 },
 "confidence": {
  "basis": "operator-supplied frontier prior; review required",
  "extraction_confidence": 1,
  "kind": "frontier_epistemic",
  "method": "expert_judgment",
  "score": 0.3
 },
 "created": "2026-06-10T06:50:55.944113+00:00",
 "evidence": {
  "effect_size": null,
  "evidence_spans": [],
  "method": "manual state transition",
  "model_system": "",
  "p_value": null,
  "replicated": false,
  "replication_count": null,
  "sample_size": null,
  "species": null,
  "type": "computational"
 },
 "flags": {
  "contested": false,
  "declining": false,
  "gap": true,
  "gravity_well": false,
  "negative_space": false,
  "retracted": false
 },
 "id": "vf_41030d44f59eae22",
 "links": [],
 "previous_version": null,
 "provenance": {
  "authors": [
   {
    "name": "reviewer:will-blair",
    "orcid": null
   }
  ],
  "citation_count": null,
  "doi": null,
  "extraction": {
   "extracted_at": "2026-06-10T06:50:55.944101+00:00",
   "extractor_version": "vela/0.691.0",
   "method": "manual_curation",
   "model": null,
   "model_version": null
  },
  "journal": null,
  "openalex_id": null,
  "pmc": null,
  "pmid": null,
  "review": {
   "corrections": [],
   "reviewed": false,
   "reviewed_at": null,
   "reviewer": null
  },
  "source_type": "expert_assertion",
  "title": "manual finding",
  "year": null
 },
 "updated": null,
 "version": 1
}

Unsealed — 0 attachment(s) on record, awaiting independent verification.

0 attachments · 0 distinct checker actors · 0 methods

blame · custody trail

produced byreviewer:will-blairfinding.asserted · 2026-06-10vev_4869af225af70848

checked byno verifier attachment on record

accepted byno accept signed

history · 2 events

e11/24finding.assertedreviewer:will-blair2026-06-10→779efff7bde16/24finding.notedreviewer:will-blair2026-06-10→dd9be8a780

record state

frontier-owned

Review status

unreviewed

finding statement

finding type

computational

No entity list is declared.

evidence

source-bound

1 atoms

computational · manual state transition

proof impact

packet context

2 events

2 reviewable changes and 0 evaluation records are attached to this finding id.

evidence

method

manual state transition

evidence type

computational

conditions

species_unverified
species_verified
text: Manually added finding; requires evidence review before scientific use.

provenance

source title

manual finding

authors

reviewer:will-blair

Source records

source record

declared

manual finding

vs_066123dd29a9c5b4

title:manual finding

manual_curation

Evidence atoms

vea_3818a2b502e64c42computational · unknown
BENCHMARK CLAIM (ProteinGym) — ProteinNPT (non-parametric transformer, supervised track) REPORTS gains by attending across labelled neighbours. VERIFICATION STATE: author-reported; SUPERVISED — not comparable to zero-shot numbers; depends on the cross-validation split. NOT re-run here. Open obligation: re-run under the official supervised CV split; never compare against zero-shot rows.
vs_066123dd29a9c5b4 · manual_curation

Typed links

outgoing

No outgoing links.

incoming

No incoming links.

Review, event, and evaluation records

events

vev_4869af225af70848finding.asserted
Manual finding added to frontier state
reviewer:will-blair · 2026-06-10
vev_a73023eb43fa7387finding.noted
HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.
reviewer:will-blair · 2026-06-10

reviewable changes

vpr_adea2a2f9e4ba533finding.add
Manual finding added to frontier state
applied · reviewer:will-blair · 2026-06-10
vpr_edef714318aa82befinding.note
HARDENING (benchmark-state): label_provenance=attested (records-not-reruns; ground truth is an answer key, not a frozen-verifier rederivation), valid_as_of=2026-06-10, model_cutoff=unknown. Under the trust ladder, attested label provenance caps this record below 'verified' until a deterministic rederivation exists.
applied · agent:hardening-2026-06-10 · 2026-06-10

evaluations

No evaluation record targets this finding id.

Review status

computational

1 atoms

2 events

Source records

manual finding

Evidence atoms

Typed links

Review, event, and evaluation records

Search Vela