record state
frontier-ownedReview status
This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
frontiers / frontier
Finding bundle
back to stateno incoming links yet
record state
frontier-ownedThis finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
finding statement
finding typeNo entity list is declared.
evidence
source-boundtheoretical · manual state transition
proof impact
packet context1 reviewable changes and 0 evaluation records are attached to this finding id.
Evidence and conditions
method
manual state transition
evidence type
theoretical
conditions
Provenance
source title
Hierarchy of Agentic Capabilities paper (2025); Frontier Model Performance assessments
authors
reviewer:will-blair
Interactive evaluation environments (agentic task suites with tool use) reveal capability gaps: frontier models pass only 28% of practical multi-step tasks despite 80th percentile benchmark performance.
vs_d6a40fc9e1985d2d · manual_curation
outgoing
vf_40e409d40571b207Interactive evals reveal gaps not captured by behavioral safety; traditional benchmarks miss agentic failure modes
incoming
No incoming links.
events
vev_f5b1a6f83a707f74finding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_19bc37637b83e49cfinding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation record targets this finding id.