source boundary
frontier-owneddeclared
A source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
frontiers / frontier
Source record
back to sourcessource boundary
frontier-ownedA source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
finding bindings
record contextFindings bound to this source through source ids, evidence atoms, provenance, or reviewed source-record slots.
evidence atoms
materializedEvidence atoms pin exact spans, measurements, selectors, or curation assertions to the source.
review context
inspectable1 reviewable changes and 0 evaluations are attached through this source or its findings.
Locator and citation
locator
title:Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework (2024); Transformer Circuit Faithfulness Metrics are not Robust (2024)
imported
2026-05-29T02:57:20.737683+00:00
extraction mode
manual_curation
authors
reviewer:will-blair
Caveats
No source-specific caveats are recorded.
methodological · vf_d3dd34cd06e3d5ce
Why do circuit faithfulness metrics (KL divergence, logit difference) fail to detect cooperative inhibition heads that individually score near-zero on attribution but prove critical for behavior, and what principled attribution metric would catch such non-additive interactions?
events
vev_a861131b43c2f3a2finding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_6a5c8894b2da8c78finding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation rows are attached.