source boundary
frontier-owneddeclared
A source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
frontiers / frontier
Source record
back to sourcessource boundary
frontier-ownedA source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
finding bindings
record contextFindings bound to this source through source ids, evidence atoms, provenance, or reviewed source-record slots.
evidence atoms
materializedEvidence atoms pin exact spans, measurements, selectors, or curation assertions to the source.
review context
inspectable1 reviewable changes and 0 evaluations are attached through this source or its findings.
Locator and citation
external sourcelocator
title:Detecting Strategic Deception Using Linear Probes (2025)
imported
2026-05-29T02:53:36.866572+00:00
extraction mode
manual_curation
authors
reviewer:will-blair
Caveats
No source-specific caveats are recorded.
Mechanistic interpretability probes (linear classifiers, attention head analysis) can detect deceptive reasoning in models with 70-85% accuracy, but probe accuracy doesn't guarantee the information is used by the model for downstream decisions.
events
vev_ddae3e7308bf681cfinding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_6b4d8444cc86284dfinding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation rows are attached.