source boundary
frontier-ownedsynthetic
A source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
frontiers / frontier
Source record
back to sourcessource boundary
frontier-ownedA source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
finding bindings
record contextFindings bound to this source through source ids, evidence atoms, provenance, or reviewed source-record slots.
evidence atoms
materializedEvidence atoms pin exact spans, measurements, selectors, or curation assertions to the source.
review context
inspectable13 reviewable changes and 0 evaluations are attached through this source or its findings.
Locator and citation
locator
title:mechinterp causal sweep
imported
2026-05-29T02:11:33.017095+00:00
extraction mode
manual_curation
authors
agent:replicator
Caveats
computational · vf_59285c747f083c24
computational · vf_d6c317a8ced07e36
computational · vf_3595f5e61d02769d
computational · vf_da0f52be04ee54bc
computational · vf_1565bc7a58ce0108
computational · vf_3e68a029a451f01f
computational · vf_4547fdc4c89640a3
computational · vf_b9b16eba11ee2668
computational · vf_f41d996b7c3aea03
The set of 6 induction heads in distilgpt2 (top heads L3H10, L3H2, L4H6, L4H9, L3H11, L5H9) is causally necessary for in-context repetition: group mean-ablation raises repeat-prediction loss by Delta=8.401 nats (per-seed deltas 9.30, 9.00, 8.48, 7.55, 7.67, all positive), versus ~0 for a control head set.
The set of 15 induction heads in gpt2 (top heads L7H10, L5H1, L5H5, L6H9, L7H2) is causally necessary for in-context repetition: group mean-ablation raises repeat-prediction loss by Delta=8.354 nats (per-seed deltas 8.56, 8.40, 8.96, 8.13, 7.72, all positive), versus ~0 for a control head set.
The set of 18 induction heads in pythia-160m (top heads L4H6, L8H2, L4H10, L4H8, L5H0) is causally necessary for in-context repetition: group mean-ablation raises repeat-prediction loss by Delta=11.520 nats (per-seed deltas 8.81, 8.55, 13.17, 10.99, 16.08, all positive), versus ~0 for a control head set.
Individual induction heads in gpt2 are highly redundant: single-head mean-ablation deltas are tiny relative to the group delta of 8.354 (L7H10 = 0.0011, ~0%; L5H5 = 0.053, 0.6%), with the sole partial exception L5H1 = 0.714 (8.5%). Group/single amplification is 11.7x, so causal necessity is a property of the ensemble, not any single head.
Individual induction heads in gpt2-medium are highly redundant: the top-3 attention-scoring heads contribute single-ablation deltas of only 0.048 (L9H9), 0.011 (L11H1), and 0.024 (L18H5) against a 29-head group delta of 8.586 — each under 0.6% of the group effect — indicating dense distributed representation across the ensemble.
Induction-head redundancy is not universal across scale: in pythia-70m individual heads remain load-bearing, with single-ablation deltas of 0.61 (L3H6, 12% of group), 0.58 (L3H5, 12%), and 1.22 (L3H1, 25%) against a group delta of 4.949 — a cooperative rather than purely-redundant circuit, contrasting the heavy redundancy seen in the larger GPT2-family and pythia-160m models.
The set of 5 induction heads in pythia-70m (top heads L3H6, L3H5, L3H1, L3H0, L4H7) is causally necessary for in-context repetition: group mean-ablation raises repeat-prediction loss by Delta=4.949 nats (per-seed deltas 4.03, 3.85, 5.22, 5.54, 6.11, all positive), versus ~0 for a control head set.
The set of 29 induction heads in gpt2-medium (top heads L9H9, L11H1, L18H5, L7H2, L6H1) is causally necessary for in-context repetition: group mean-ablation raises repeat-prediction loss by Delta=8.586 nats (per-seed deltas 8.53, 8.61, 8.84, 8.55, 8.41, all positive), versus ~0 for a control head set.
Across all five transformers (gpt2, distilgpt2, pythia-70m, pythia-160m, gpt2-medium), the in-context-repetition circuit is universal and stereotyped by depth: duplicate-token heads emerge shallowest (layers 0-1), previous-token heads at early-to-mid depth, and induction heads in the middle-to-deep band (~30-90% relative depth); the induction-head group is causally necessary in every model (group-ablation delta 4.95-11.52 nats, all per-seed deltas positive).
events
vev_c9c362a7a45bedd9finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_44524ec6e670d2b3finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_765881c2115c6939finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_30c1fa3ecc8020e4finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_59e3ee2d193ef0e5finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_8b931d0335cf69e3finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_0032cf66707fb0c7finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_803223185cc9c144finding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_0e2ce4e5ffa52f6cfinding.assertedManual finding added to frontier state
agent:replicator · 2026-05-29
vev_430f9a5805d874cefinding.caveatedVerifier caution: The mean-ablation methodology is sound and the causal claim (11.7x redundancy) is valid for the repeated-token task tested. However, framing this as general induction-head redundancy overstates the evidence. The finding is task-specific; individual heads may contribute meaningful
reviewer:will-blair · 2026-05-29
vev_3412dc672cbe0d15finding.caveatedVerifier caution: Same methodology as finding 1; the <0.6% contribution claim is valid for the repeated-token task. However, this finding lacks error bars, significance testing (p-values), or confidence intervals. The per-seed deltas are not reported, making it impossible to assess noise. Addition
reviewer:will-blair · 2026-05-29
vev_e6556beb3ac94e15finding.caveatedVerifier caution: The scaling hypothesis—that redundancy is not universal and smaller models show load-bearing heads—is interesting but unverified. The finding presents the pythia-70m vs pythia-160m difference as fact based on a single task (repeated tokens). The 25% vs <1% split suggests a real e
reviewer:will-blair · 2026-05-29
vev_833001ee0ddf660cfinding.caveatedVerifier caution: The core claim—that a group of induction heads is causally necessary across 5 models—is valid and well-supported by the repeated-token ablation. However, the language significantly overstates the scope: 'universal' and 'stereotyped by depth' imply properties across tasks, archite
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_5eddea1462fb2cf2finding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_fe64d90d4d96b934finding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_500c879be1223a5ffinding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_ba42905897b2b7f5finding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_11b4fe7cdad6afabfinding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_c91d1e8994ff28dbfinding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_f36e647df1a9cba4finding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_fa16be3ca10509c9finding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_23db2dbfd9b1e369finding.addManual finding added to frontier state
applied · agent:replicator · 2026-05-29
vpr_38c52c6f5a9ae2d8finding.caveatVerifier caution: The mean-ablation methodology is sound and the causal claim (11.7x redundancy) is valid for the repeated-token task tested. However, framing this as general induction-head redundancy overstates the evidence. The finding is task-specific; individual heads may contribute meaningful
applied · reviewer:will-blair · 2026-05-29
vpr_438beac177da1cb4finding.caveatVerifier caution: Same methodology as finding 1; the <0.6% contribution claim is valid for the repeated-token task. However, this finding lacks error bars, significance testing (p-values), or confidence intervals. The per-seed deltas are not reported, making it impossible to assess noise. Addition
applied · reviewer:will-blair · 2026-05-29
vpr_cbe6ab24fc274f43finding.caveatVerifier caution: The scaling hypothesis—that redundancy is not universal and smaller models show load-bearing heads—is interesting but unverified. The finding presents the pythia-70m vs pythia-160m difference as fact based on a single task (repeated tokens). The 25% vs <1% split suggests a real e
applied · reviewer:will-blair · 2026-05-29
vpr_3906611ccc2f44e6finding.caveatVerifier caution: The core claim—that a group of induction heads is causally necessary across 5 models—is valid and well-supported by the repeated-token ablation. However, the language significantly overstates the scope: 'universal' and 'stereotyped by depth' imply properties across tasks, archite
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation rows are attached.