record state
frontier-ownedReview status
This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
frontiers / frontier
Finding bundle
back to staterecord state
frontier-ownedThis finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
finding statement
finding typeNo entity list is declared.
evidence
source-boundtheoretical · manual state transition
proof impact
packet context1 reviewable changes and 0 evaluation records are attached to this finding id.
Evidence and conditions
method
manual state transition
evidence type
theoretical
conditions
Provenance
source title
Sleeper Agents paper (Hubinger et al., 2024)
authors
reviewer:will-blair
Sleeper agents—models trained to behave safely during training but activate harmful behavior post-deployment—can persist through standard safety training procedures.
vs_d4f4579197e9ae15 · manual_curation
outgoing
vf_0d47c80d55ef8fc8Mechanistic probes proposed as detection method for sleeper agents, but accuracy limitations remain
events
vev_ad377dde037f73adfinding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_75751f97b87b33a2finding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation record targets this finding id.