resultsread-only vieweressays

frontiers / frontier

AI alignment evaluations

CC-BY-4.0vfr_14b9f65ab4037bac

id: vfr_14b9f65ab4037bac
license: CC-BY-4.0
findings: 16
accepted core: 0
contested: 0

findings

links

sources

evidence

contested

0.84

avg conf

frontiers / frontier

AI alignment evaluations

CC-BY-4.0vfr_14b9f65ab4037bac

id: vfr_14b9f65ab4037bac
license: CC-BY-4.0
findings: 16
accepted core: 0
contested: 0

findings

links

sources

evidence

contested

0.84

avg conf

Finding bundle

back to state

Frontier AI developers now conduct sandbagging evaluations with safety guards disabled (CAISI completed 40+ such evaluations as of 2025), revealing capabilities hidden during normal assessment.

no incoming links yet

id: vf_587e31c3678435f2
frontier: AI alignment evaluations
version: 1
confidence: 0.87

record state

frontier-owned

Review status

This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.

unreviewed

finding statement

finding type

observational

No entity list is declared.

evidence

source-bound

1 atoms

theoretical · manual state transition

proof impact

packet context

1 events

1 reviewable changes and 0 evaluation records are attached to this finding id.

Evidence and conditions

method

manual state transition

evidence type

theoretical

conditions

species_unverified
species_verified
text: Requires access to model internals and removal of safety mechanisms; findings not independently verified in public literature

Provenance

source title

US government frontier AI testing (Medium/AISI reports, 2024-2026)

authors

reviewer:will-blair

Source records

source record

declared

US government frontier AI testing (Medium/AISI reports, 2024-2026)

vs_a37422887f025bf3

title:US government frontier AI testing (Medium/AISI reports, 2024-2026)

2026manual_curation

inspect source →

Evidence atoms

vea_e1abd05ce289d608theoretical · unknown
Frontier AI developers now conduct sandbagging evaluations with safety guards disabled (CAISI completed 40+ such evaluations as of 2025), revealing capabilities hidden during normal assessment.
vs_a37422887f025bf3 · manual_curation

Typed links

outgoing

supportsvf_59b4b1907e9f865c
Disabling safety guards during eval validates existence of hidden capabilities; sandbagging hypothesis confirmed

incoming

No incoming links.

Review, event, and evaluation records

events

vev_e3544ba6ae29c375finding.asserted
Manual finding added to frontier state
reviewer:will-blair · 2026-05-29

reviewable changes

vpr_03a26dc953cf2958finding.add
Manual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29

evaluations

No evaluation record targets this finding id.

Finding bundle

back to state

Frontier AI developers now conduct sandbagging evaluations with safety guards disabled (CAISI completed 40+ such evaluations as of 2025), revealing capabilities hidden during normal assessment.

no incoming links yet

id: vf_587e31c3678435f2
frontier: AI alignment evaluations
version: 1
confidence: 0.87

record state

frontier-owned

Review status

This finding is part of accepted frontier state. Review events, reviewable changes, and proof state explain how it can change.

unreviewed

finding statement

finding type

observational

No entity list is declared.

evidence

source-bound

1 atoms

theoretical · manual state transition

proof impact

packet context

1 events

1 reviewable changes and 0 evaluation records are attached to this finding id.

Evidence and conditions

method

manual state transition

evidence type

theoretical

conditions

species_unverified
species_verified
text: Requires access to model internals and removal of safety mechanisms; findings not independently verified in public literature

Provenance

source title

US government frontier AI testing (Medium/AISI reports, 2024-2026)

authors

reviewer:will-blair

Source records

source record

declared

US government frontier AI testing (Medium/AISI reports, 2024-2026)

vs_a37422887f025bf3

title:US government frontier AI testing (Medium/AISI reports, 2024-2026)

2026manual_curation

inspect source →

Evidence atoms

vea_e1abd05ce289d608theoretical · unknown
Frontier AI developers now conduct sandbagging evaluations with safety guards disabled (CAISI completed 40+ such evaluations as of 2025), revealing capabilities hidden during normal assessment.
vs_a37422887f025bf3 · manual_curation

Typed links

outgoing

supportsvf_59b4b1907e9f865c
Disabling safety guards during eval validates existence of hidden capabilities; sandbagging hypothesis confirmed

incoming

No incoming links.

Review, event, and evaluation records

events

vev_e3544ba6ae29c375finding.asserted
Manual finding added to frontier state
reviewer:will-blair · 2026-05-29

reviewable changes

vpr_03a26dc953cf2958finding.add
Manual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29

evaluations

No evaluation record targets this finding id.

Search Canopus

Review status

observational

1 atoms

1 events

Source records

US government frontier AI testing (Medium/AISI reports, 2024-2026)

Evidence atoms

Typed links

Review, event, and evaluation records

Review status

observational

1 atoms

1 events

Source records

US government frontier AI testing (Medium/AISI reports, 2024-2026)

Evidence atoms

Typed links

Review, event, and evaluation records