{"slug": "adding-a-softcore-to-snestang-part-1", "title": "Adding a Softcore to SNESTang - part 1", "summary": "The article explains the addition of a softcore CPU to the SNESTang 0.3 FPGA project to handle I/O tasks like USB and file systems, which previously consumed valuable FPGA logic space when implemented in Verilog. The chosen softcore, PicoRV32, uses SDRAM for main memory and stores its firmware in SPI flash, solving the problem of limited block RAM on the Tang Primer 25K board. This approach frees up FPGA resources for game logic while maintaining ease of use for the end user.", "body_md": "Adding a Softcore to SNESTang - part 1\nIn the recently released SNESTang 0.3, a softcore-based I/O system is added to enhance the menu system and file system support. Let us explore how this works. Part one of the article discusses why the soft core is necessary, choice of CPU to use and how it works with the SDRAM.\nA bit of background first. My FPGA cores, NESTang and SNESTang, used to be completely standalone, with everything written in Verilog. This helped keep things simple, easy to install and update for the enduser, which are one of the goals of the projects. However, there are functions that are much easier to implement using a CPU rather than a hardware description language, like USB controllers, menu system, file systems, etc. Projects like MiSTer and MIST use separate ARM processors to handle these tasks. MiSTer even runs a full Linux OS on the ARM. Unfortunately the Sipeed Tang FPGA boards that NESTang/SNESTang run on do not have a processor chip for this purpose (at least not one that is currently available for developers to use). For this reason, we have resorted to implementing things like FAT32 with Verilog. Although they work, the downside is that as we add more of these I/O features, they take up precious FPGA logic space, which should be used for game logic.\nIntroducing a softcore CPU could solve these problems, as all these I/O functionality, no matter how complex, could then be implemented with firmware that does not take FPGA space. But there is one major challenge: softcore memory space. Most FPGA softcore examples use FPGA block RAM for softcore memory. Unfortunately for our case, block RAM is in short supply. The Tang Primer 25K only has 126KB of block RAM, 90% of which is already used by the core itself. The remaining 10% is not enough. There is also the other problem of where to put the program. Most examples just store the program in HDL arrays or block RAM. So we are back to either taking up logic space or BRAM space again.\nOur solution is to use SDRAM for softcore main memory, and store the firmware program in bitstream SPI flash memory. It turns out to be working great, and is also relatively easy to use for the enduser. There is no need for extra addon boards or hard-to-use software tools involved.\nAs for the softcore itself, PicoRV32 is chosen for its small size and easy-to-control memory interface. The core runs in RV32I mode with no interrupt support. Total area of the softcore plus SD, OSD and UART is about 2000 LUTs, smaller than the previous Verilog SD implementation. I also experimented with the excellent FemtoRV32 by Bruno Levy. It was also very helpful.\nHere are the more in-depth details.\nSDRAM to the rescue\nSDRAM would be a nice place to put the softcore memory. Basically all retro-gaming oriented FPGA boards have it, including the Tang boards. They are also spacious compared to FPGA block RAM. Here two 32MB SDRAM chips are available to us. A few MB will probably be enough for our firmware for the foreseeable future. The only problem left is that we need a way to share the SDRAM between the gaming core and softcore CPU. So down the rabbit hole we go...\nOur SDRAM is 16-bit wide, with 32MB of total space divided into 4 banks. As described in SNESTang design notes, the SDRAM controller that we wrote for the SNES core provides two access channels (or \"ports\"). Channel 0 is for everything S-CPU (SNES CPU), the cartridge ROM, WRAM and BSRAM. Channel 1 is for the ARAM of S-SMP (the audio processor). These two channels work in parallel and are implemented with SDRAM bank interleaving. This way, they appear to the S-CPU and S-SMP as separate memory chips, as in the original SNES hardware. SDRAM runs at 6x main logic clock speed, with the following fixed sequence of operations.\nfclk_# CPU ARAM\n0 RAS1\n1 CAS1 DATA2\n2 RAS2/Refresh\n3\n4 DATA1 CAS2\n5\nA memory access begins with a RAS (row activation) SDRAM command, followed by CAS (column activate). Finally memory sends back DATA, if it is a read. Note how accesses from the two channels overlap (\"interleave\") with each other. This works because they access different banks of the chip. The row/column addresses and data are registered by the memory's per-bank circuitry during the accesses.\nNow we need to add softcore accesses to the (already busy) mix. One way we can do this is to add more bank interleaving. We actually already have plans to add another channel for the SNES core to use. So in total that would mean we use all four banks and provide four channels of access. The main issue with interleaving many banks, is it requires high memory clock speed, which tends to make things unstable. I played with the 4-channel idea and came up with several sequencing schemes. Unfortunately, none of them works for me reliably. So I was kinda stuck.\nIf we look at our problem carefully, however, one property we can exploit is that the softcore does not have tight timing requirements like the SNES core. It is fine if a softcore CPU instruction is delayed for a few cycles. So one idea is to let it tag along with one of the existing channels, i.e. share access with a lower priority. As long as the existing channel is not 100% full, the softcore will run ok, although a bit more slowly. So that is the final solution. The new access sequences look like this,\nNormal schedule Delayed write\nfclk_# CPU/RV ARAM 3rd CPU/RV ARAM 3rd\n0 RAS RAS\n1 RAS\n2 CAS READ\n3 CAS RAS\n4\n5 DATA DATA\n6 DATA WRITE\n7\nFirst of all, the access sequence is longer at 8 cycles as we already included slots for a 3rd channel (not in use yet). That means memory is now running at 86.4Mhz (8*10.8Mhz). I have not been able to use the memory at over 100Mhz. So this is close to our limit. Second, there are two schedules (\"normal\" and \"delayed write\") for the accesses. I will not go into details why this is necessary as it has more to do with allowing three channels than sharing the first channel. It suffices to say that SDRAM timing is kinda tricky and details are buried deep in the datasheets...\nBack to the main topic, the low priority access of the softcore is implemented with something like this,\nmodule sdram_snes(\n...\ninput [22:1] rv_addr,\ninput rv_rd,\ninput rv_wr,\noutput reg rv_wait, // Softcore request is not serviced this cycle\n...\n);\nalways @(posedge fclk) begin\n...\nif (cycle[0]) begin\nif (cpu_rd | cpu_wr) begin\n... // RAS for cpu access\nrv_wait <= 1;\nend else if (rv_rd | rv_wr) begin\n... // RAS for softcore access\nrv_wait <= 0;\nend\nend\n...\nend\nBasically we added an extra output rv_wait\n, which becomes high whenever the RV softcore is preempted by the SNES CPU for memory accesses. This signal is then sent to PicoRV32's memory interface, instructing it to keep trying the memory access until it finally succeeds.\nHow big of a performance impact does this bring? Directory loading in the menu when a game is running is slightly slower. That is all I can notice. So I would say this is a good solution. It is scalable enough to allow up to 8MB of memory space (one bank of the SDRAM) for the softcore. It is also quite general. For other gaming cores, as long as there is one SDRAM channel that is not 100% utilized, we can attach the softcore to it.\nIn the next part of this article. We will discuss where the firmware program is stored and how the softcore boot process works.\nContinue to part 2.", "url": "https://wpnews.pro/news/adding-a-softcore-to-snestang-part-1", "canonical_source": "https://nand2mario.github.io/posts/2024/softcore_for_fpga_gaming/", "published_at": "2024-02-03 00:00:00+00:00", "updated_at": "2026-05-23 15:56:59.037842+00:00", "lang": "en", "topics": ["hardware", "open-source", "semiconductor", "developer-tools"], "entities": ["SNESTang", "NESTang", "MiSTer", "MIST", "Sipeed Tang", "Verilog", "FAT32", "ARM"], "alternates": {"html": "https://wpnews.pro/news/adding-a-softcore-to-snestang-part-1", "markdown": "https://wpnews.pro/news/adding-a-softcore-to-snestang-part-1.md", "text": "https://wpnews.pro/news/adding-a-softcore-to-snestang-part-1.txt", "jsonld": "https://wpnews.pro/news/adding-a-softcore-to-snestang-part-1.jsonld"}}