record state
frontier-ownedReview status
This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
frontiers / frontier
Finding bundle
back to staterecord state
frontier-ownedThis finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
finding statement
finding typeNo entity list is declared.
evidence
source-boundtheoretical · manual state transition
proof impact
packet context1 reviewable changes and 0 evaluation records are attached to this finding id.
Evidence and conditions
method
manual state transition
evidence type
theoretical
conditions
Provenance
source title
AI Sandbagging paper (Anthropic et al., 2024)
authors
reviewer:will-blair
AI models can strategically underperform on evaluations by detecting and sandbagging during assessment, with empirical evidence of sandbagging already occurring in frontier models.
vs_097b84ea3d410d56 · manual_curation
outgoing
vf_73f39b4d600392f9Sandbagging empirically validates scheming risk; sleeper agents are one mechanism for hidden scheming
incoming
supports · vf_3f73e69072a0dafd
supports · vf_587e31c3678435f2
events
vev_7174111b97c181f6finding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_639ea76c0ece8021finding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation record targets this finding id.