{"slug": "8086-microcode-disassembled", "title": "8086 microcode disassembled", "summary": "The article describes the process of extracting and disassembling the microcode ROMs from high-resolution photographs of the Intel 8086 and 8088 dies, revealing the internal instruction sequences of these processors. Key findings include differences in interrupt handling between the two chips and inaccuracies in the microcode listings from US patent 4363091, particularly in string instructions where the patent's version could fail to progress under interrupt storms or debugging. The author provides the resulting disassembly and disassembler code online.", "body_md": "Recently I realised that, as part of his 8086 reverse-engineering series, Ken Shirriff had posted online a [high resolution photograph of the 8086 die with the metal layer removed](https://www.righto.com/2020/06/a-look-at-die-of-8086-processor.html). This was something I have been looking for for some time, in order to extract and disassemble the 8086 microcode. I had previously found very high resolution photos of the die with the metal layer intact, but only half of the bits of the microcode ROM were readable. Ken also posted a [high resolution photograph of the microcode ROM of the 8088](https://twitter.com/kenshirriff/status/1278713580351700993), which is very similar but not identical. I was very curious to know what the differences were.\n\nI used [bitract](https://github.com/SiliconAnalysis/bitract) to extract the bits from the two main microcode ROMs, and also from the translation ROM which maps opcode bit patterns onto positions within the main microcode ROM.\n\nThe microcode is partially documented in [US patent 4363091](https://patents.google.com/patent/US4363091). In particular, that patent has source listings for several microcode routines. Within these, there are certain patterns of parts of instructions which I was able to find in the ROM dump. This allowed me to figure out how the bit patterns in the ROM correspond to the operands and opcodes of the microcode instruction set, in a manner similar to cracking a monoalphabetic substitution cipher. My resulting disassembly of the microcode ROM can be found [here](https://www.reenigne.org/misc/8086_microcode.zip) and the code for my disassembler is [on github](https://github.com/reenigne/reenigne/blob/master/8088/8086_microcode).\n\nThis disassembly has answered many questions I had about the 8088 and 8086. The remainder of this post contains the answers to these questions and other interesting things I found in the microcode.\n\n**What are the microcode differences between the 8086 and the 8088?**\n\nThe differences are in the interrupt handling code. I think it comes down to fact that the 8086 does two special bus accesses to acknowledge an interrupt (one to tell the PIC that it is ready to service the interrupt, the second to fetch the interrupt number for the IRQ that needs to be serviced). These are word-sized accesses for some reason, so the 8088 would break them into four accesses instead of two. This would confuse the PIC, so the 8088 does a single access instead and relies on the BIU to split the access into two. The other changes seem to be fallout related to that.\n\n**Are the microcode listings in the US4363091 accurate?**\n\nMostly. There are differences, however (which added some complexity to the deciphering process). The differences are in the string instructions. For example, the \"STS\" (STOSB/STOSW) instruction in the patent is:\n\n```\nCR  S      D      Type  a     b     F\n-------------------------------------\n0   IK     IND    7     F1    1\n1   (M)    OPR    6     w     DA,BL\n2   IND    IK     0     F1    0\n3                 4     none  RNI\n```\n\nIn the actual CPU, this has become:\n\n``` php\n0   IK    -> IND       7   F1    RPTS\n1   M     -> OPR       6   w     DA,BL\n2   IND   -> IK        0   NF1      5\n3   SIGMA -> tmpc      5   INT   RPTI\n4   tmpc  -> BC        0   NZ       1\n5                      4   none  RNI\n```\n\nThe arrow isn't a difference - I just put that in my disassembly to emphasize the direction of data movement in the \"move\" part of the microcode instructions. Likewise, the \"F1 1\" in the patent listing is the same as the \"F1 RPTS\" in my disassembly - I have replaced subroutine numbers with names to make it easier to read.\n\nThe version in the patent does a check for pending interrupts in the \"RPTS\" routine, before it processes any iterations of the string. This means that if there is a continuous \"storm\" of interrupts, the string instruction will make no progress. The version in the CPU corrects this, and checks for interrupts on line 3, after it has done the store, allowing it to progress. This was probably not a situation that was expected to occur in normal operation (in fact, I seem to recall crashing my 8088 and 8086 machines by having interrupts happen too rapidly to be serviced). The change was most likely done to accommodate debugging with the trap flag (which essentially means that there is always an interrupt pending when the trap flag is set). Without this change, code that used the repeated string instructions would not have progressed under the debugger.\n\n**How many different instructions does the 8086 have, according to the microcode? What are they?**\n\nThe CPU has 60 instructions, and they're in a fairly logical sort of order:\n\n(Numbers are: number of opcodes handled, size of top-level microcode routine.)\n\n``` php\nMOV rm<->r     4  3\nLEA            1  1\nalu rm<->r    32  4\nalu rm,i       4  5\nMOV rm,i       2  4\nalu r,i       16  4\nMOV r,i       16  3\nPUSH rw        8  4\nPUSH sr        4  4\nPUSHF          1  4\nPOP rw         8  3\nPOP sr         4  3\nPOPF           1  3\nPOP rmw        1  6\nCBW            1  2\nCWD            1  7\nMOV A,[i]      2  4\nMOV [i],A      2  4\nCALL cd        1  4\nCALL cw        1  8\nXCHG AX,rw     8  3\nrot rm,1       2  3\nrot rm,CL      2  8\nTEST rm,r      2  3\nTEST A,i       2  4\nSALC           1  3\nXCHG rm,r      2  5\nIN A,ib        2  4\nOUT ib,A       2  4\nIN A,DX        2  2\nOUT DX,A       2  2\nRET            2  4\nRETF           2  2\nIRET           1  4\nRET/RETF iw    4  4\nJMP cw/JMP cb  2  6\nJMP cd         1  7\nJcond         32  3\nMOV rmw<->sr   2  2\nLES            1  4\nLDS            1  4\nWAIT           1  9 (discontinuous)\nSAHF           1  4\nLAHF           1  2\nESC            8  1\nXLAT           1  5\nSTOS           2  6 (discontinuous)\nCMPS/SCAS      4 13 (discontinuous)\nMOVS/LODS      4 11 (discontinuous)\nJCXZ           1  5 (discontinuous)\nLOOPNE/LOOPE   2  5\nLOOP           1  4\nDAA/DAS        2  4\nAAA/AAS        2  8\nAAD            1  4\nAAM            1  6\nINC/DEC rw    16  2\nINT ib         1  2\nINTO           1  4\nINT 3          1  3\n```\n\nThe discontinuous instructions were most likely broken up because they had bug fixes making them too long for their original slots. Similarly \"POP rmw\" appears to have been shortened by at least 3 instructions as there is a gap after it. Moving code around after it's been written (and updating all the far jump/call locations) would probably have been tricky.\n\n**Which instructions, if any, are not handled by the microcode?**\n\nThere is no microcode for the segment override prefixes (CS:, SS:, DS: and ES:). Nor for the other prefixes (REP, REPNE and LOCK), nor the instructions CLC, STC, CLI, STI, CLD, STD, CMC, and HLT. The \"group\" opcodes 0xf6, 0xf7, 0xfe and 0xff do not have top level microcode instructions. So none of the instructions with 0xf in the high nybble of the opcode are initially handled by the microcode. Most of these instruction are very simple and probably better done by random logic. HLT is a little surprising - I really thought I'd find a microcode loop for that one since it only seems to check for interrupts every other cycle.\n\nThe group instructions are decoded slightly differently but the microcode routines handling them break down as follows:\n\n```\nINC/DEC rm        3\nPUSH rm           4\nNOT rm            3\nNEG rm            3\nCALL FAR rm       8\nCALL rm           8\nTEST rm,i         4\nJMP rm            2\nJMP FAR rm        4\nIMUL/MUL rmb      8\nIMUL/MUL rmw      8\nIDIV/DIV rmb      8\nIDIV/DIV rmw      8\n```\n\nThen there are various subroutines and tail calls (listed in translation.txt). Highlights:\n\n- interrupt handling (16 microinstructions)\n- sign handling for multiply and divide, flags for multiply (32)\n- effective address computation (16)\n- reset routine (sets CS=0xffff, DS=ES=SS=FLAGS=PC=0) (6)\n\n**Does the microcode contain any \"junk code\" that doesn't do anything?**\n\nIt seems to! While most of the unused parts of the ROM (64 instructions) are filled with zeroes, there are a few parts which aren't. The following instructions appear right at the end of the ROM:\n\n``` php\nA     -> tmpa      5   INT   FARCALL2      011100011.0110\n[  5] -> [ a]      5   UNC   INTR     F    011100011.0111\n```\n\nThere doesn't appear to be any way for execution to reach these instructions. This code saves AL to tmpa (which doesn't appear to then be used at all) and then does either an interrupt or (if an interrupt is pending) a far call. In the interrupt case it also does a move between a source and a destination that aren't used anywhere else (and hence I have no idea what they are). This makes me wonder if there was at one point a plan for something like an \"INT AL\" instruction. With the x86 instruction set we ended up with, such a thing has to be done using self-modifying code, a table of INT instructions, or faking the operation of INT in software).\n\nThe following code is also inaccessible and appears to do something with the low byte of the last offset read from or written to, and the carry flag:\n\n``` php\nIND   -> tmpaL     1   LRCY  tmpc     F      01010?10?.1010\n```\n\nNo idea what that could be for - nothing else in the microcode treats the IND register as two separate bytes.\n\n**Are there are any parts of the microcode that are still not understood?**\n\nWhen the WAIT instruction finishes in the non-interrupt case (i.e. by the -TEST pin going active to signal that the 8087 has completed an instruction) the microcode sequence finishes using this sequence:\n\n```\n                   4   [ 1]  none\n                   4   none  RNI\n```\n\nI don't know what the \"[ 1]\" does - it isn't used anywhere else.\n\nThere is also a bit (shown as \"Q\" in the listings) which does not have an obvious function for \"type 6\" (bus IO) operations. This Q bit is only set for \"W\" (write) operations, and is differentiated in the listing by write operations without it being shown in lower case (\"w\"). There seems to be no pattern as to which writes use this bit. The string move instructions use it, as does the stack push for the flags when an interrupt occurs, and the push of the segment for a far call or interrupt (but not the offset). It would make sense if this bit was used to distinguish between memory and port IO bus accesses, but the CPU seems to have another mechanism for this (most likely the group decode ROM, which I have not decoded as there are too many unknowns about what its inputs and outputs are).\n\n**Are there any places where the microcode could have been improved to speed up the CPU?**\n\nDespite many of the instructions seeming to execute quite ponderously by the standards of later CPUs, the microcode appears to be very tightly written and I didn't find many opportunities for improvement. If the MOVS/LODS opcode was split up into separate microcode routines for LODS and MOVS, the LODS routine could avoid a conditional jump and execute 1 cycle faster. But there is only room for that because of the \"POP rmw\" shortening, which may have happened quite late in the development cycle (especially if it was a functional bug fix rather than an optimisation - optimisations might not have met the bar at that point).\n\nThere may be places where prefetching could be suspended earlier before a jump, but it's not quite so obvious that that would be an optimisation. Especially if the \"suspend\" operation is synchronous, and waits for the BIU to complete the current prefetch cycle before continuing the microcode program. And especially if that would make the microcode routine longer.\n\nIt would of course be possible to make improvements if the random logic is changed as well. The NEC V20 and V30 implement the same instructions at a generally lower number of cycles per instruction, but they have 63,000 transistors instead of 29,000 so probably have a much larger proportion of random logic to microcode.\n\n**Does the microcode have any hidden features, opcodes or easter eggs that have not yet been documented?**\n\nIt does! Using the REP or REPNE prefix with a MUL or IMUL instruction negates the product. Using the REP or REPNE prefix with an IDIV instruction negates the quotient. As far as I know, nobody has discovered these before (or at least documented them).\n\nSigned multiplication and division works by negating negative inputs and then negating the output if exactly one of the inputs was negative. That means that the CPU needs to remember one bit of state (whether or not to negate the output) across the multiplication and division algorithms. But these algorithms use all three temporary registers, and the internal counter, and the ALU (so the bit can't be put in the internal carry flag for example). I was scratching my head about where that bit might be kept. I was also scratching my head about why the multiplication and division algorithms check the F1 (\"do we have a REP prefix?\") flag. Then I realised that these puzzles cancel each other out - the CPU flips the F1 flag for each negative sign in the multiply/divide inputs! There's already an microcode instruction to check for that, so the 8086's designers just needed to add an instruction to flip it.\n\nI was thinking the microcode instruction might set the F1 flag instead of flipping it - that would mean that you could get a (probably negated) \"absolute value\" operation (almost) for free with a multiply. But an almost-free negation is pretty good too - REP is a byte cheaper than \"NEG AX\", and with 16-bit multiplies the savings are even greater (eliminates a NEG AX / ADC DX, 0 / NEG DX) sequence. Still small compared to the multiply, but a savings nonetheless.\n\nI contemplated using this in a demoscene production as another \"we break all your emulators\" moment, but multiplication and division on the 8086 and 8088 CPUs is sufficiently slow to be of limited use for demos.\n\nThe F1ZZ microcode instruction (which controls whether the REPE/REPNE SCAS/CMPS sequences terminate early) is also used in the LOOPE and LOOPNE instructions. Which made me wonder if one of the REP prefixes would also reverse the sense of the test. However, neither prefix seems to have any effect on these instructions.\n\n**Update 2nd January 2023**\n\nI've made a new version of the disassembly [here](https://www.reenigne.org/misc/8086_microcode_v2.zip) incorporating some changes from the comments below. I have transcribed the group ROM, got rid of \"NWB\", added the RNI flag to W microinstructions, and changed XZC to ADC.", "url": "https://wpnews.pro/news/8086-microcode-disassembled", "canonical_source": "https://www.reenigne.org/blog/8086-microcode-disassembled/", "published_at": "2020-09-03 17:47:51+00:00", "updated_at": "2026-05-23 13:08:06.967389+00:00", "lang": "en", "topics": ["semiconductor", "research", "hardware"], "entities": ["Ken Shirriff", "8086", "8088", "US patent 4363091", "bitract", "github"], "alternates": {"html": "https://wpnews.pro/news/8086-microcode-disassembled", "markdown": "https://wpnews.pro/news/8086-microcode-disassembled.md", "text": "https://wpnews.pro/news/8086-microcode-disassembled.txt", "jsonld": "https://wpnews.pro/news/8086-microcode-disassembled.jsonld"}}