# Ablating Induction Heads Leads to an increase in Local Repetition

> Source: <https://www.lesswrong.com/posts/nTdcXg7jtLwtvcts7/ablating-induction-heads-leads-to-an-increase-in-local>
> Published: 2026-05-30 06:00:33+00:00

*This post is intended as a brief overview of an independent research project in mechanistic interpretability. I am open to feedback, criticism, and any thoughts on the work.*

This project started off as an exploration into mechanistic interpretability and an opportunity to learn some tools in interpretability. It was during this exploration that I noticed a recurring pattern in the output of the GPT-2 model that I was working on. In the original model the first token prediction for the input prompt “The capital of India is” was "home," which changed to “India” when the [induction heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) of the model were ablated. The pattern was that the output of the induction head ablated model produced more repetitive output than the original model. I investigated this hypothesis in this project.

To give you a sense of the structure of this blog, I am first going to include a summary of the main results of the project and include a link to the paper and the code, and then I’m going to explain the motivations of the experiments, methodology, results, context, etc.

**Summary of Results:**

1) Induction head ablation was shown to increase local repetition in the model’s output. This was validated rigorously using activation patching and an experiment comparing change in local repetition due to random head ablation and induction head ablation.

2) A non-linear relation was obtained between top induction head ablations and change in local repetition. Local repetition peaks in the range of 3-5 top head ablations before dropping.

3) Entropy was investigated as a causative factor, which provided mixed results. While there was a decrease in per-head entropy, there was an increase in overall entropy. The global increase is not consistent with the increase in local repetition.

4) Existing findings about the effect of induction head ablations not being additive were validated both in the context of local repetition and entropy.

5) While most of the primary studies were performed on the GPT-2 small language model, to investigate the generalizability of the hypothesis, similar findings were observed in the GPT-2 medium model as well.

To read the paper click [here](https://github.com/arjun041008/Induction_Head_MechInter/blob/main/paper.pdf). If you prefer to read the code directly, click [here](https://drive.google.com/drive/folders/1CciJYwYHkYHQQyThZB8USLMc4PG0dRXv?usp=sharing).

**Context and Motivations:**

Induction heads are attention heads in the transformer architecture that cause models to reinforce repeated semantic contexts seen previously in the text. For Example, if a transformer sees the sentence

Ram went to the market. Mohan went to the store. Ram went to the

It’s the induction head, or more precisely, a circuit of induction heads, that’s likely to significantly contribute towards predicting the next token is "market." The motivation for ablating the induction head in this trial was to determine the effect of token diversity as a result of ablating top induction heads. When the increase in local repetition was first noticed, it was slightly counterintuitive, as one would expect token diversity to increase on ablating heads as opposed to decreasing.

The output of these induction-head-ablated models was somewhat akin to the output of older-generation natural language processing tools like LSTMs, or even earlier generations of GPT, which occasionally went into [degenerate repetitive failure modes.](https://stackoverflow.com/questions/47618285/why-my-lstm-model-is-repeating-the-previous-values) This led to the hypothesis that induction heads, in addition to maintaining inductive behavior, also enable a model to generalize.

**Experiments Performed and Results:**

**Defining Metrics**: The first step before testing the hypothesis was to define metrics to measure local repetition. For this task, we define two metrics: 1) Local Repetition Rate (LRR) and 2) [Shannon Entropy](https://www.sciencedirect.com/topics/engineering/shannon-entropy) of final token probabilities, where the former is a direct marker of model prediction, while the latter is a measure of a causative factor. To provide some intuition about LRR, one can think of it as a metric that counts the number of repeated tokens for a particular window size (20 in our case), loops over this operation for multiple window sizes, and computes an average. For a more mathematically formal definition, it was defined as follows, where *N* is the window size and *w* are the previously seen tokens.

**Initial Baseline**: The first experiment attempted to look at different categories of prompts and determine the difference in LRR between the original model and ablated model. We found a statistically highly meaningful correlation between original and ablated LRR; however, there was no significant difference between prompt categories.

**Experiment to Compare LRR:** In designing this experiment, we determined the difference between ablating the top-6 induction heads versus randomly chosen non-induction heads. The aim of this experiment was to determine whether ablating any head causes LRR increases or if this is a product of ablating induction heads in particular. The average LRR differential of ablating induction heads was 0.3065 compared to -0.0170 for ablating non-induction heads. To generate a casual mechanistic claim, [activation patching](https://arxiv.org/pdf/2404.15255) was performed.

This technique involves replacing the activations of the original model in the model whose induction heads were ablated to determine the exact component of the model architecture that causes local repetition. This idea of the effect of replacing corrupted activation with original activations and noticing the difference in model performance (in particular LRR) was formalized using a recovery rate. The mean recovery rate was 0.965, which shows that induction heads are responsible for the difference in LRR.

**Per-Head vs. Global Ablation Effects: **Many of the experiments performed clearly demonstrated that individual head ablations are not additive. In the context of LRR, individually ablating most heads (including induction heads) did not yield any significant difference in LRR, as seen in the figure below. However, there was a significant difference in LRR when multiple induction heads were ablated. This, while not being a new finding, reinforces the existing idea about circuits being more important than individual heads or transformer blocks. This pattern was also noticed in the context of entropy, where entropy ended up increasing (albeit very little) as a result of global ablation but decreased to a significant extent as a result of individual head ablation. The contrast between higher entropy but greater repetition in the case of global ablation, we attribute more to computational factors as opposed to any mechanistic cause.

*Heatmap of difference in Entropy and LRR as a result of individual head ablations*

Another point to notice in the given heatmaps is that some attention heads in layer 0 have a disproportionate effect on both entropy and LRR differences. This can be attributed in part to a later breakdown in transformer circuitry caused by ablation in early layers.

**A Non-Linear Relationship: **To better understand the effect of induction head ablations, an experiment was conducted to determine the change in LRR as the number of induction heads ablated increases. As seen in the figure below, the change in LRR peaks after ablating the top-4 induction heads and then subsequently decreases. A hypothesis as to why this may be happening is that successive ablation first increases local repetition, as expected, following which further ablation causes greater randomness in model output to an extent that there are no coherent patterns, including that of local repetition, observed.

*Number of Induction Heads Ablated vs. Change in LRR*

**Generalizing Across Models: **All of the experiments performed thus far were performed on the [GPT-2 small](https://medium.com/data-science/how-to-interpret-gpt2-small-76e0536a588a) model, which contains 124 million parameters. To understand whether this hypothesis might generalize to larger language models, it was tested on the [GPT-2 medium ](https://huggingface.co/openai-community/gpt2-medium)model, which contains 355 million parameters. The mean LRR of this trial was less than the previous trials on GPT-2 Small; however, it remains statistically significant. This aligns with the expectation that as model size increases, the effect of individual mechanistic phenomena, while being present, will decrease in its effect.

**Conclusion:**

Overall, the study presents an interesting hypothesis on the behavior, which needs to be further investigated. Some areas of future explanation include trying to mechanistically determine the cause of the non-linear behaviors and determine whether it generalizes to other model behaviors beyond repetition. A way to go about this might involve attribution patching to determine the scores of various heads. This study also does not analyze the role of multi-layer perceptrons (MLPs) in the transformer circuit; it’s possible that they also play a role in causing local repetition, which needs to be investigated further. Future exploration might also retest the hypothesis on a larger sample of prompts and on a greater number of LLMs.
