{"slug": "exploration-of-a-dna-sequencing-basecaller-using-activation-patching", "title": "Exploration of a DNA Sequencing Basecaller using Activation Patching", "summary": "A researcher applied activation patching to a DNA sequencing basecaller, a deep learning model, to understand its internal mechanisms, finding MLP dominance in early and late layers and concentrated self-attention activity in middle layers. The work aims to improve sequencing accuracy for pathogen surveillance and explore universality in deep learning models.", "body_md": "*This write-up for an undergraduate project is my first LW post, made with the objective of *\n\n*a) gathering feedback on the project and post, if more experienced authors are willing, and *\n\n*b) sending out results of a mech-interp-on-a-non-LLM (specifically, DNA basecaller) exploration in case the idea is interesting to anyone. *\n\n*Apologies in advance for any inconveniences and mistakes, and thank you in advance for your understanding.*\n\n**Summary**\n\nAs a first AI Safety/mech interp learning project, I tried applying activation patching to a DNA sequencing basecaller- a deep learning model used to convert time-series electrical signals into a sequence of DNA bases. While looking at data related to errors in DNA sequences, I found trends showing overall MLP dominance, especially in the earlier and later layers, greater self-attention mechanism activity in the middle layers, and higher activations concentrated in specific attention heads.\n\nThis experiment interested me because basecallers are part of the modern DNA sequencing pipeline, and increasing the accuracy of sequencing methods could support work towards pathogen-agnostic surveillance systems. Though this technology is highly accurate, some difficulties (such as repeated bases or higher performance on species in training data) still remain, and mech interp seems like a potential path to understanding these systematic challenges. Additionally, in the mech interp field, finding similar patterns to LLMs would suggest universality and potentially add to insights about general behaviors of deep learning models.\n\nThere are key limitations that prevent conclusive insights from my work, but I’d love to know if anyone more experienced is able to glean anything interesting or speak on if this is a worthwhile direction to research in the future. The [full paper](https://drive.google.com/file/d/1cqdEWf3xB1rUcSQ5CsdD_FrkICirZYSM/view?usp=sharing) and [codebase](https://github.com/KnockbackNemo/Bonito_Activation_Patching_Homopolymers) are also available online for anyone curious.\n\n**Self-attention vs MLP recovery and degradation scores for the repeated-base (homopolymer) error group.*** Higher scores indicate patching the component resulted in greater confidence in the correct/original output choice. A score of 1 indicates the patching induced full recovery/degradation to the target signal, and a score of 0 indicates patching produced no change. Scores are computed using the max change in logit difference across all signal timesteps.*\n\n**Background acknowledgements**\n\nI’m a recently-graduated Electrical and Computer Engineering major interested in pivoting into AI Safety, but I am in no way an expert on either technical AI safety, mech interp, or bioinformatics. This project came as a result of me pivoting my undergraduate thesis into something that would help me test research in AI safety, so I approached it primarily as a learning and exploration experience. As my first and biggest project in several fields I am inexperienced in, I acknowledge that there may be severe holes in my methods and that the conclusions or results may be straight up wrong. That being said, I really enjoyed the experience and would be more than grateful to read or discuss any comments on this work.\n\n**Methods**\n\nActivation patching involves swapping sections of model activations from two almost similar inputs to test how much that section affects the output. The aim is to isolate the sections responsible for certain behaviors by asking, “If we swap activations in section 1 to pretend the model saw input B instead of input A, does that give us output B?” In my case, the behavior in question was correctly counting the length of repeated DNA sequences (homopolymers). An example of a test was, “If I have an input 5 bases long, and I swap in activations in section 1 to pretend it saw an input 6 bases long instead, then if the whole model predicts 6 bases, section 1 is probably important to this task.” For each pair, activations were swapped in both directions to test which sections, for example, could be patched to “recover” 5 bases from a corrupted input of 6, and which could “degrade” the model’s prediction for an input of 5 into thinking it saw 6. While I was not able to isolate trends related to this behavior specifically and can only present general findings from the process of applying activation patching, this question helps explain the reasoning behind the rest of the setup.\n\nTo generate data, clean and corrupt pairs of raw nanopore sequencer data were created to form two groups- one from repeating (homopolymer) DNA sequences and one from nonrepeating (non-homopolymer) DNA sequences. In order to create similar pairs of signals with small, localized differences that could be patched, raw data was found from the Oxford Nanopore Technologies (ONT) POD5 repo [multi_fast5_zip_v0.pod5](https://github.com/nanoporetech/pod5-file-format/blob/master/test_data/multi_fast5_zip_v0.pod5) file and artificially altered (by injecting noise or dampening) to create clean (original) and corrupt (altered) pairs. Difficulties in creating a dataset led to key limitations discussed later. Homopolymers were chosen specifically as the feature of interest because the repeating bases create a signal plateau that basecallers commonly struggle to count.\n\n**Dataset generation process.*** A dataset of 49 clean/corrupt pairs (35 homopolymer and 14 non-homopolymer) was created by finding natural DNA sequencer reads with and without repeated bases and corrupting them via noise injection or dampening the signal. Different corruptions were tested experimentally to find DNA sequences and corruptions that would create single-bases errors in the decoded string.*\n\nThe model tested was the open-source [ONT Bonito](https://github.com/nanoporetech/bonito) 5.2.0 SUP model (version dna_r10.4.1_e8.2_400bps_sup@v5.2.0), which uses a CNN followed by an 18-layer transformer and conditional random field algorithm to transform raw time-series data from the sensed electrical current into a DNA sequence.\n\nUsing [nnsight](https://nnsight.net/), I patched the model across all 18 layers in both the noising and denoising directions for three levels of granularity:\n\nResults are compared across sections, layers, and the two groups: homopolymer and non-homopolymer sequences.\n\n**Results**\n\nLayer patching showed a large level of recovery/degradation, suggesting the method was correctly patching the model. Activity across layers seemed to follow a pattern consistent with deep learning models: the MLP played a greater role than the self-attention mechanism, though this gap decreased at points during the middle layers where the self-attention appeared to increase in activity. While denoising and noising results were generally similar, this was not always the case. Finally, activity in attention heads appeared to be primarily concentrated in certain heads while other heads contribute to a much smaller extent. The comparison between homopolymer and non-homopolymer error groups was not different enough to draw meaningful conclusions, though this could be due to an unbalanced dataset or the method of corrupting data.\n\n**Whole-transformer block patching results by layer***. Patching at the final layer resulted in exact recovery/degradation, serving as a sanity-check that activation patching targeted the correct section of the model. It is unknown why results from the single-base region error group consistently show higher scores than the homopolymer group. This trend persists across all tests.*\n\nResults are generated using a recovery (for denoising) and degradation (noising) score where 0 represents no change from the initial input and 1 represents complete degradation or recovery. Scores above 1 indicate greater confidence in the changed output, and negative scores indicate greater confidence in the wrong/original output. Scores are generated from the logits predicting the probability of any window of five bases and next bases (e.g. one possibility is a sequence transition of AAAAA → [A]AAAAC) using the maximum change across all patched timesteps.\n\nThe results appear to follow general deep learning architecture- where context from surrounding tokens is factored in to a greater extent in middle layers- and one hypothesis is that the spikes in middle-layer attention activity could reflect the level of complexity of introduced corruptions: more complicated than basic patterns which the MLP may be recognizing early in the model, but not quite at the level of detailed last stages. Since the methodology appears to apply meaningfully, spikes in attention head activity suggest that it could be possible to find circuits performing functions related to systematic issues.\n\n**Combined results across all components in the homopolymer dataset group***. Recovery and degradation scores are shown for activation patching at the layer, MLP, self-attention, and individual attention head levels. Scores shown are from taking the maximum score across all timesteps.*\n\n**Limitations**\n\nKey issues include\n\n**Questions**\n\nAny other thoughts on the design, process, write-up, etc., would be also be greatly valued!\n\n**Future work**\n\nImprovements to this work that would help solidify its findings include\n\n*Disclosure: I used AI to help review and edit this post. Many thanks given to my thesis advisors, and all mistakes are my own.*", "url": "https://wpnews.pro/news/exploration-of-a-dna-sequencing-basecaller-using-activation-patching", "canonical_source": "https://www.lesswrong.com/posts/mxA7584MuZeBBFgaz/exploration-of-a-dna-sequencing-basecaller-using-activation", "published_at": "2026-06-13 01:48:30+00:00", "updated_at": "2026-06-13 02:21:33.375215+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "ai-safety", "ai-research"], "entities": ["Bonito", "KnockbackNemo"], "alternates": {"html": "https://wpnews.pro/news/exploration-of-a-dna-sequencing-basecaller-using-activation-patching", "markdown": "https://wpnews.pro/news/exploration-of-a-dna-sequencing-basecaller-using-activation-patching.md", "text": "https://wpnews.pro/news/exploration-of-a-dna-sequencing-basecaller-using-activation-patching.txt", "jsonld": "https://wpnews.pro/news/exploration-of-a-dna-sequencing-basecaller-using-activation-patching.jsonld"}}