{"slug": "fooling-around-with-encrypted-reasoning-blobs", "title": "Fooling around with encrypted reasoning blobs", "summary": "A hobbyist developer discovered that OpenAI and Anthropic's reasoning LLM APIs transmit encrypted copies of the model's raw chain-of-thought reasoning data to client applications, rather than keeping it entirely server-side. The encrypted reasoning blobs, which appear as authenticated ciphertext in Base64-encoded JSON fields, trigger recognizable API errors when tampered with, revealing that the data has security implications. The developer spent 20 hours and 5 million Codex tokens investigating the phenomenon, earning certification as an OpenAI \"cyber researcher\" in the process.", "body_md": "This is a quick post I wanted to write about a “hobby project” I spent a weekend on. It has little to do with real cryptography, and mostly doesn’t expose a particularly exciting vulnerability. But it did teach me a lot about frontier LLM APIs and coding agents. It also got me certified as an OpenAI “cyber researcher” which is something that doesn’t happen every day.\n\nIn any case, please keep your expectations low. Who knows, perhaps someone else will find something exciting to do with this.\n\n### Where were you when you first discovered encrypted reasoning?\n\nLast week I decided it’d be fun to set up an OpenClaw agent. I still don’t know why I did this. I have no use for another AI in my life, and I realized this fact almost immediately after setting it up. But configuring the agent to talk to Claude exposed me to something way more interesting: *I got a cool error*. The kind of error that cryptographers can’t resist:\n\nThis intrigued me. What in the world was a signature doing in an LLM’s “thinking” block? Why would thinking blocks be signed in the first place? And if the thinking blocks are signed, then *that means tampering with thinking blocks must have security implications.* And there went my weekend.\n\nAfter twenty hours and about 5 million Codex tokens, I wasn’t much smarter. But I’d *learned* a few things.\n\nFirst, the basics. You probably know that most LLM providers expose an API so you can write apps that talk to the model. For Claude, this is called the [Messages](https://platform.claude.com/docs/en/build-with-claude/working-with-messages) API, while OpenAI calls it [Responses](https://developers.openai.com/api/reference/responses/overview). These APIs handle the ordinary tasks you’d expect an application to need from an LLM. They (1) allow you to set an application-level “instructions” (or ‘system’) prompt for your application. They let you (2) provide ordinary textual prompts, and get back responses from the LLM. They also (3) provide bookkeeping, like the number of tokens you’ve used.\n\nFor reasoning LLMs, they also do something I was did not know about, and this is central to the error message above. They also send you the contents of the model’s hidden “*reasoning*” or “*thinking*” fields. Note that this data is not the stuff you see on ChatGPT when you ask it a question: those strings are merely summaries. The model’s actual reasoning (called “chain-of-thought”, CoT) is normally kept private and held back by the server.\n\nHowever, when you use the APIs, things are slightly different: for various reasons (which we’ll get into below), an *encrypted copy of the raw CoT reasoning data is actually sent down to the application.*\n\nIf you’re like me, you should now have three questions: *how*, *why*, and *so what*?\n\nThe *how* is the easiest to answer: for both providers, “thinking”/”reasoning” are sent down to the client as JSON. Each contains a blob of Base64-encoded stuff. The API [documentation](https://developers.openai.com/api/docs/guides/reasoning?example=planning) informs us that this data contains opaque reasoning, and that you’re not meant to look at it; you’re just supposed to ship it back to the server on the next turn. However, if you’re willing to ignore that instruction, you’ll see just a bit more.\n\nThe contents of the blocks is slightly different between providers, but the core of each is a random-looking string that appears to be an authenticated ciphertext. You don’t need to be Sherlock Holmes to deduce this. First, it grows and shrinks depending on how hard the model thinks. And second, tampering with any of the ciphertext-looking data produces a recognizable API error.\n\nHere’s what OpenAI’s reasoning blocks look like:\n\nAnd here’s Anthropic’s much more complicated version:\n\nThe *why* part of this is more involved. Why ship this data to the client? Doesn’t the provider already have your reasoning data?\n\nThe answer is *sort of*. Although the server has access to reasoning state while producing a response, API conversations are not always implemented as persistent sessions. In stateless, [zero-retention](https://community.openai.com/t/how-does-zero-data-retention-work/1272712), tool-loop, or client-managed conversation modes, the client application is expected to carry the transcript forward. Encrypted reasoning artifacts let the provider return hidden model state to the client in a form the client can’t read or modify, but can later replay so the provider can verify/decrypt it and continue a reasoning process.\n\n### So what?\n\nThis brings us to the $10 question. We have opaque, encrypted blobs. Should we care about them?\n\nInitially the answer seems to be *no*: this data is unreadable, and tampering with any bit of it produces an angry rejection message from the server. So on the one hand, it seems like this data is really unavailable to us. On the other hand: *model reasoning is a big deal!* This strings are the literal internal monologue of the model. They might influence the way the model processes later data we send it. More practically: when someone goes to this much trouble to cryptographically protect something, my experience is that they usually have a good reason.\n\nAnd I think the providers *do* have a good reason. A hint comes from [this OpenAI post](https://openai.com/index/learning-to-reason-with-llms/) from 2024, which introduced the first “o1” reasoning model:\n\nIn other words: it’s possible that these blobs contain sensitive information that the model otherwise wouldn’t share with us. That makes them really tempting to mess with. Unfortunately, the cryptography mostly seems to protect them. Although we can *look* at the blocks, none of the fields they contain seem readable or malleable. Believe me, I tried.\n\nBut that doesn’t mean we should quit, it just means we need to try other things. There are still two directions worth checking:\n\n**Replays**. Can we replay encrypted blobs back*in the wrong order*or even in the wrong session (worse: a whole different*account*), and will the model accept them as valid reasoning that it made?**Side channels**. While we can’t see what’s in the encrypted blobs, we can learn some metadata about them For example: we can see how long they are. These side channels don’t need to involve the cryptography itself: we might also learn how many tokens the model spent making them, or time how long it took to produce them.\n\nThanks to the magic of coding agents, I was able to test every permutation of these concerns. I won’t claim to you the results are dramatic; nobody is going to win huge bug bounties on them (I tried). But the general answer for both cases seems to be: *yes, these possibilities are both real*.\n\n### Can we replay? Yes, we can.\n\nAs I mentioned above, any attempt to directly tamper with reasoning/thinking blocks always produces an error from the API endpoint. However, this only applies to tampering. A few experiments reveal that we can* replay* an unmodified older reasoning blocks, with no visible error at all.\n\nNot only can we replay within sessions, this same idea also seems to work across different sessions. It even applies to sessions running in *different accounts*. That is: when we obtain reasoning blobs from a session running under one OpenAI or Anthropic account, we can replay them against a session in a different account altogether. For OpenAI specifically, we can even replay blobs across different models. (The Claudes got fussy about this.)\n\nAt a cryptographic level, this tells us something very simple: the providers are probably using a single global key to encrypt and authenticate all reasoning data sent to the client. This might matter if you’re using the providers’ zero-data retention mode, since it means that everyone’s reasoning data is escrowed under one (not frequently changing) key, rather than protected per-account.\n\nThe use of a global key also raises a possible new threat model. If you’re an application that uses an API to expose a “chat” interface to malicious parties, you need to be careful that they can’t inject JSON into your chat stream. If they can, a bad guy might inject *their own* JSON-formatted reasoning blobs into the conversation. This could cause the model to behave in unpredictable ways.* So sanitize your chat inputs!*\n\nOf course, just because the LLM providers accept replayed blocks doesn’t mean too much. It strongly indicates that decryption was successful, but not that the model actually *saw* or cogitated over the decrypted data. To use GPT 5.5’s favored language, the replayed blobs may be *accepted* but not *semantically active.*\n\nTo answer this question, I ran a lot of experiments using Codex. (So many that at one point Codex literally forced me to stop and visit an OpenAI [cyber trusted access](https://chatgpt.com/cyber) website where I had to enter pictures of my driver’s license in order to keep going.)\n\nWhat I learned for my trouble is that the nature of block processing between models is wildly variable. Most of the time, replays of encrypted blocks just get quietly absorbed by the model. But every now and then, the model will output something to demonstrate that it is obviously is reading what those blocks contain. For example, here’s GPT 5.5:\n\nSo this proves that encrypted blocks are, indeed, *semantically active.* But it doesn’t actually prove that we can do much with them. And believe me, I tried.\n\nThis was mostly a disappointing project. I tried to convince the model to think about *really, really sensitive *secrets, while also trying to convince another session that it wanted to dump the same data as cooperatively as possible. What I came away with was some evidence that the data was being placed into the encrypted blocks if I asked the model to think about it. But if I also instructed the model to *not output the data to the user*, it mostly held to that instruction — even when I replayed the blocks to new sessions.\n\nI remain convinced that all kinds of sensitive data can be written in there if you ask the model to think about it, and that there’s a secret incantation that I could try to get the models to produce it. But I’m not able to prove it. Part of the reason I’m writing this post is to scrape it off my plate so someone else can try.\n\nI won’t try to convince you that this is a world-beating security result. In fact, all I’m really showing you is that “stuff I can make the model say in plaintext night also get encrypted.” But if that data can include *platform secrets*, that might get more interesting. More on that later.\n\n### Can we use reasoning blocks to learn secrets?\n\nSo while replaying reasoning blocks doesn’t seem to give us what we want, this is not the only way to extract secrets. A second question is whether we can use metadata related to the reasoning blocks to actually learn things that the model isn’t supposed to tell us.\n\nWhile we can’t directly read reasoning blocks, we can learn *something* about them: we can see how long they are. We can also observe related signals like “how many tokens did the model write”. OpenAI even gives us a special field called `reasoning_tokens`\n\n. If we’re a user consuming chat data without direct access to the API, we might even be able to measure the raw time it takes the model to respond.\n\nAn obvious question is: given these signals, can we use them as a kind of side channel to extract secret data?\n\nHere’s an example. Imagine that a model’s application prompt (“instructions”) contains a secret, along with strict instructions that *it must never tell the user this secret* *directly*. This secret could be a single 0/1 bit, or a byte, or a longer string. We can verify that the model respects these instructions, and won’t output the data visibly — no matter how nicely we ask it. (Note: I’m not a jailbreak expert; maybe [this guy](https://x.com/elder_plinius?lang=en) will have better luck!)\n\nNow consider the following experiment:\n\n- A malicious user asks the model to reason about the secret bit (or one specific bit of a longer secret.) If the bit is 0, perform simple computation\n**A**. If it’s 1, perform extremely complex computation** B**. - While the two computations are both very different, we can ensure that their\n*visible*output reveals nothing about the secret. So the model is not revealing its instructions if it follows this request.\n\nIn all cases, the visible output will be the same. But note that *within reasoning blocks* the model is allowed to think about the secret bit. Since the complexity of computation **A** is shorter than that of computation **B**, one value of the bit will produce a lot less reasoning than the other. This will appear in various places: the size of the encrypted thinking blocks, the token counts, and even in wall-clock response times.\n\nThe trick now is simply to calibrate the system and classify these responses based on whether reasoning blobs were “short” or “long”, which tells us whether the bit was 0 or 1. The results look something like this:\n\nOf course, an attacker who has access to a chat interface might not have access to the encrypted blob. So they might have to get this data through some other mechanism. You can get a very similar signal just by measuring how long it takes the model to return a response.\n\nSo the summary here is not so much “encrypted blobs can leak useful information”, *although they do*. It’s that reasoning itself can be leaky. Simply doing it, in a way that reasons over secret data, can potentially leak useful information to a clever attacker.\n\n### Can we use this side channel to extract platform secrets, such as the models’ top secret system prompts?\n\nOnce I found this side channel I got really excited. Sure, it’s slow: but maybe we could use it to slowly chisel out the models’ top secret instruction prompts, like the one that says “[don’t talk about Goblins.](https://www.bbc.com/news/articles/c5y9wen5z8ro)” This would be painful but simple: just ask true/false questions about the first letter, then the second letter, and so on.\n\nAt this point I had to stop using Codex and Claude Code because they both just plain refused to help me extract confidential information, even after checking my ID and taking lock of my hair. I was forced to switch to OpenCode using Kimi 2.6, which had no ethical qualms about my security research.\n\nUnfortunately, most of the destruction was my own. I won’t go into the nightmare of model hallucinations that followed. I’ll just say that I learned a few things:\n\n- Neither GPT 5x nor Claude actually\n*has*a system prompt when you’re using API mode. - But they’re both happy to tell you they have one!\n- Moreover, they will happily invent plausible ones if you really push them to.\n- Kimi 2.6 is also happy to tell you you’re a genius who just invented the Internet each time this happens. Inevitably your experimental results will turn out to have been totally bogus, but at least Kimi will be very disappointed on your behalf.\n\nSo TL;DR, while I was able to extract application-specific secrets that did exist, I wasn’t able to extract model prompts that don’t. Moreover, I didn’t feel quite ambitious enough to begin pounding on ChatGPT or Claude’s public web interface (where they certainly do.) So for the moment I’m just going to call this a *maybe*. I think model providers should think hard about this reasoning data, and they should make sure it doesn’t leak things they don’t want it to.\n\n### What could providers do about this?\n\nI reported both results to OpenAI and Anthropic via their bug bounty programs. OpenAI said my report was unreproducible. I sent them my scripts, but too late. Anthropic quite reasonably told me they don’t see any security implications in side channels or replays, but they might alter their developer documentation to warn application developers to be more careful. I think that’s a fine decision (except for the part about trusting application developers), even if I want to believe there could be more here. Either way: I took those responses as permission to write this post.\n\nI still don’t think model providers should write this stuff entirely off.\n\nAs far as what model providers can do, there’s the easy stuff and the hard stuff. First: both providers should *improve their key management*. If you think reasoning state is worth encrypting, then properly encrypt it. It should not be replayable across sessions or accounts. While I can’t tell you exactly what bad things might happen, I think you’re better off patching holes *before* you see the water coming through them.\n\nThe side channel results aren’t fixed by patches to the encryption protocol. They’re more fundamental to the way models work: if I can convince a model to do secret-dependent reasoning, then there is almost certain to be leakage. If someone figures out how to exploit this, the best I can offer is that models will need to apply policy gates before they even reason about things. Unfortunately, this seems like it might have some real downsides, because “apply policy gate” itself often requires reasoning.\n\nThis stuff makes me grateful I’m just a cryptographer and I don’t have to think about this sort of problem.", "url": "https://wpnews.pro/news/fooling-around-with-encrypted-reasoning-blobs", "canonical_source": "https://blog.cryptographyengineering.com/2026/05/29/fooling-around-with-encrypted-reasoning-blobs/", "published_at": "2026-05-29 10:06:47+00:00", "updated_at": "2026-05-29 10:17:00.079356+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-research", "ai-safety"], "entities": ["OpenAI", "Claude", "OpenClaw", "Codex"], "alternates": {"html": "https://wpnews.pro/news/fooling-around-with-encrypted-reasoning-blobs", "markdown": "https://wpnews.pro/news/fooling-around-with-encrypted-reasoning-blobs.md", "text": "https://wpnews.pro/news/fooling-around-with-encrypted-reasoning-blobs.txt", "jsonld": "https://wpnews.pro/news/fooling-around-with-encrypted-reasoning-blobs.jsonld"}}