record state
frontier-ownedReview status
This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
frontiers / frontier
Finding bundle
back to stateno incoming links yet
record state
frontier-ownedThis finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.
finding statement
finding typeNo entity list is declared.
evidence
source-boundtheoretical · manual state transition
proof impact
packet context1 reviewable changes and 0 evaluation records are attached to this finding id.
Evidence and conditions
method
manual state transition
evidence type
theoretical
conditions
Provenance
source title
MART paper (2023); AutoAdv and Constitutional Classifiers research
authors
reviewer:will-blair
Red-teaming protocols using multi-round automatic adversarial prompting can expose jailbreaks in 86% of undefended models, but attack success rates improve when adversaries analyze failed attempts iteratively.
vs_c8934d8d087493b1 · manual_curation
outgoing
vf_491436508804de41Red-teaming finds jailbreaks; Constitutional Classifiers defend against them; but both miss scheming
incoming
No incoming links.
events
vev_a771cce7260724dbfinding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_75799f62bb4be4f9finding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation record targets this finding id.