Notes on the Pentium's microcode circuitry The article explains that the Pentium processor uses microcode, a layer of software underneath machine instructions, to control its circuitry. The Pentium's microcode ROM contains 4,608 micro-instructions, each 90 bits long, stored in a grid of transistors across two banks. The data is physically encoded by the presence or absence of transistors in the silicon, which can be visually extracted by removing the metal and polysilicon layers. Most people think of machine instructions as the fundamental steps that a computer performs. However, many processors have another layer of software underneath: microcode. With microcode, instead of building the processor's control circuitry from complex logic gates, the control logic is implemented with code known as microcode, stored in the microcode ROM. To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode. In this post, I examine the microcode ROM in the original Pentium, looking at the low-level circuitry. The photo below shows the Pentium's thumbnail-sized silicon die under a microscope. I've labeled the main functional blocks. The microcode ROM is highlighted at the right. If you look closely, you can see that the microcode ROM consists of two rectangular banks, one above the other. The image below shows a closeup of the two microcode ROM banks. Each bank provides 45 bits of output; together they implement a micro-instruction that is 90 bits long. Each bank consists of a grid of transistors arranged into 288 rows and 720 columns. The microcode ROM holds 4608 micro-instructions, 414,720 bits in total. At this magnification, the ROM appears featureless, but it is covered with horizontal wires, each just 1.5 µm thick. The ROM's 90 output lines are collected into a bundle of wires between the banks, as shown above. The detail shows how six of the bits exit from the banks and join the bundle. This bundle exits the ROM to the left, travels to various parts of the chip, and controls the chip's circuitry. The output lines are in the chip's top metal layer M3 : the Pentium has three layers of metal wiring with M1 on the bottom, M2 in the middle, and M3 on top. The Pentium has a large number of bits in its micro-instruction, 90 bits compared to 21 bits in the 8086. Presumably, the Pentium has a "horizontal" microcode architecture, where the microcode bits correspond to low-level control signals, as opposed to "vertical" microcode, where the bits are encoded into denser micro-instructions. I don't have any information on the Pentium's encoding of microcode; unlike the 8086, the Pentium's patents don't provide any clues. The 8086's microcode ROM holds 512 micro-instructions, much less than the Pentium's 4608 micro-instructions. This makes sense, given the much greater complexity of the Pentium's instruction set, including the floating-point unit on the chip. The image below shows a closeup of the Pentium's microcode ROM. For this image, I removed the three layers of metal and the polysilicon layer to expose the chip's underlying silicon. The pattern of silicon doping is visible, showing the transistors and thus the data stored in the ROM. If you have enough time, you can extract the bits from the ROM by examining the silicon and seeing where transistors are present. Before explaining the ROM's circuitry, I'll review how an NMOS transistor is constructed. A transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain regions green consist of silicon doped with impurities to change its semiconductor properties, forming N+ silicon. These regions are visible in the photo above. The gate consists of a layer of polysilicon red , separated from the silicon by a very thin insulating oxide layer. Whenever polysilicon crosses active silicon, a transistor is formed. Bits are stored in the ROM through the pattern of transistors in the grid. The presence or absence of a transistor stores a 0 or 1 bit.1 The closeup below shows eight bits of the microcode ROM. There are four transistors present and four gaps where transistors are missing. Thus, this part of the ROM holds four 0 bits and four 1 bits. For the diagram below, I removed the three metal layers and the polysilicon to show the underlying silicon. I colored doped active silicon regions green, and drew in the horizontal polysilicon lines in red. As explained above, a transistor is created if polysilicon crosses doped silicon. Thus, the contents of the ROM are defined by the pattern of silicon regions, which creates the transistors. The horizontal silicon lines are used as wiring to provide ground to the transistors, while the horizontal polysilicon lines select one of the rows in the ROM. The transistors in that row will turn on, pulling the associated output lines low. That is, the presence of a transistor in a row causes the output to be pulled low, while the absence of a transistor causes the output line to remain high. The diagram below shows the silicon, polysilicon, and bottom metal M1 layers. I removed the metal from the left to reveal the silicon and polysilicon underneath, but the pattern of vertical metal lines continues there. As shown earlier, the silicon pattern forms transistors. Each horizontal metal line has a connection to ground through a metal line not shown . The horizontal polysilicon lines select a row. When polysilicon lines cross doped silicon, the gate of a transistor is formed. Two transistors may share the drain, as in the transistor pair on the left. The vertical metal wires form the outputs. The circles are contacts between the metal wire and the silicon of a transistor.2 Short metal jumpers connect the polysilicon lines to the metal layer above, which will be described next. The image below shows the upper left corner of the ROM. The yellowish metal lines are the top metal layer M3 , while the reddish metal lines are the middle metal layer M2 . The thick yellowish M3 lines distribute ground to the ROM. Underneath the horizontal M3 line, a horizontal M2 line also distributes ground. The grids of black dots are numerous contacts between the M3 line and the M2 line, providing a low-resistance connection. The M2 line, in turn, connects to vertical M1 ground lines underneath—these wide vertical lines are faintly visible. These M1 lines connect to the silicon, as shown earlier, providing ground to each transistor. This illustrates the complexity of power distribution in the Pentium: the thick top metal M3 is the primary distribution of +5 volts and ground through the chip, but power must be passed down through M2 and M1 to reach the transistors. The other important feature above is the horizontal metal lines, which help distribute the row-select signals. As shown earlier, horizontal polysilicon lines provide the row-select signals to the transistors. However, polysilicon is not as good a conductor as metal, so long polysilicon lines have too much resistance. The solution is to run metal lines in parallel, periodically connected to the underlying polysilicon lines and reducing the overall resistance. Since the vertical metal output lines are in the M1 layer, the horizontal row-select lines run in the M2 layer so they don't collide. Short "jumpers" in the M1 layer connect the M2 lines to the polysilicon lines. To summarize, each ROM bank contains a grid of transistors and transistor vacancies to define the bits of the ROM. The ROM is carefully designed so the different layers—silicon, polysilicon, M1, and M2—work together to maximize the ROM's performance and density. As the Pentium executes an instruction, it provides the address of each micro-instruction to the microcode ROM. The Pentium holds this address—the micro-address—in the Microcode Address Register MAR . The MAR is a 13-bit register located above the microcode ROM. The diagram below shows the Microcode Address Register above the upper ROM bank. It consists of 13 bits; each bit has multiple latches to hold the value as well as any pushed subroutine micro-addresses. Between bits 7 and 8, some buffer circuitry amplifies the control signals that go to each bit's circuitry. At the right, drivers amplify the outputs from the MAR, sending the signals to the row drivers and column-select circuitry that I will discuss below. To the left of the MAR is a 32-bit register that is apparently unrelated to the microcode ROM, although I haven't determined its function. The outputs from the Microcode Address Register select rows and columns in the microcode ROM, as I'll explain below. Bits 12 through 7 of the MAR select a block of 8 rows, while bits 6 through 4 select a row in this block. Bits 3 through 0 select one column out of each group of 16 columns to select an output bit. Thus, the microcode address controls what word is provided by the ROM. Several different operations can be performed on the Microcode Address Register. When executing a machine instruction, the MAR must be loaded with the address of the corresponding microcode routine. I haven't determined how this address is generated. As microcode is executed, the MAR is usually incremented to move to the next micro-instruction. However, the MAR can branch to a new micro-address as required. The MAR also supports microcode subroutine calls; it will push the current micro-address and jump to the new micro-address. At the end of the micro-subroutine, the micro-address is popped so execution returns to the previous location. The MAR supports three levels of subroutine calls, as it contains three registers to hold the stack of pushed micro-addresses. The MAR receives control signals and addresses from standard-cell logic located above the MAR. Strangely, in Intel's published floorplans for the Pentium, this standard-cell logic is labeled as part of the branch prediction logic, which is above it. However, carefully tracing the signals from the standard-cell logic shows that is connected to the Microcode Address Register, not the branch predictor. As explained above, each ROM bank has 288 rows of transistors, with polysilicon lines to select one of the rows. To the right of the ROM is circuitry that activates one of these row-select lines, based on the micro-address. Each row matches a different 9-bit address. A straightforward implementation would use a 9-input AND gate for each row, matching a particular pattern of 9 address bits or their complements. However, this implementation would require 576 very large AND gates, so it is impractical. Instead, the Pentium uses an optimized implementation with one 6-input AND gate for each group of 8 rows. The remaining three address bits are decoded once at the top of the ROM. As a result, each row only needs one gate, detecting if its group of eight rows is selected and if the particular one of eight is selected. The schematic above shows the circuitry for a group of eight rows, slightly simplified.3 At the top, three address bits are decoded, generating eight output lines with one active at a time. The remaining six address bits are inverted, providing the bit and its complement to the decoding circuitry. Thus, the 9 bits are converted into 20 signals that flow through the decoders, a large number of wires, but not unmanageable. Each group of eight rows has a 6-input AND gate that matches a particular 6-bit address, determined by which inputs are complemented and which are not.4 The NAND gate and inverter at the left combine the 3-bit decoding and the 6-bit decoding, activating the appropriate row. Since there are up to 720 transistors in each row, the row-select lines need to be driven with high current. Thus, the row-select drivers use large transistors, roughly 25 times the size of a regular transistor. To fit these transistors into the same vertical spacing as the rest of the decoding circuitry, a tricky packing is used. The drivers for each group of 8 rows are packed into a 3×3 grid, except the first column has two drivers since there are 8 drivers in the group, not 9 . To avoid a gap, the drivers in the first column are larger vertically and squashed horizontally. The schematic below shows the multiplexer circuit that selects one of 16 columns for a microcode output bit. The first stage has four 4-to-1 multiplexers. Next, another 4-to-1 multiplexer selects one of the outputs. Finally, a BiCMOS driver amplifies the outpu