record state
frontier-ownedReview status
This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
frontiers / frontier
Finding bundle
back to stateno incoming links yet
record state
frontier-ownedThis finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
finding statement
finding typeNo entity list is declared.
evidence
source-boundtheoretical · manual state transition
proof impact
packet context1 reviewable changes and 0 evaluation records are attached to this finding id.
Evidence and conditions
method
manual state transition
evidence type
theoretical
conditions
Provenance
source title
Benchmark Data Contamination Survey; Frontier Model Performance Gap studies
authors
reviewer:will-blair
Benchmark data contamination affects 16-91% of test sets across major LLMs, with models achieving high benchmark scores while failing 72% of real-world task executions.
vs_7cca77c270387400 · manual_curation
outgoing
vf_3ea1bb869e1c5f9bContaminated benchmarks inflate safety assessments, masking alignment problems that behavioral evals miss
incoming
No incoming links.
events
vev_06aa9f6b7d19e3fcfinding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_f81e53314fc2d8abfinding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation record targets this finding id.