I've been experimenting since 2019 with ways to minimize RAM usage for tiny MLP inference on microcontrollers. [0]
This project is the result of that exploration: a fully static-allocation approach to MLP inference in ANSI C, using a simple 2-slot ring buffer to keep memory usage predictable and extremely low, while at the same time fast.
I believe this is close to the practical lower bound for RAM usage in general-purpose CPU MLP inference without sacrificing speed or introducing runtime complexity.
A more aggressive approach I've previously used is allocating and freeing memory per layer-to-layer pair during inference, but that introduces overhead and fragmentation if not used carefully. [1]
Curious how it compares to other minimal inference implementations people have seen (or built). Feedback and edge cases welcome. Hope you like it. Have fun. <3
[0]: [https://github.com/GiorgosXou/NeuralNetworks#-research](https://github.com/GiorgosXou/NeuralNetworks#-research)
[1]: look for REDUCE_RAM_DELETE_OUTPUTS in the source of [0]
Comments URL: [https://news.ycombinator.com/item?id=48318304](https://news.ycombinator.com/item?id=48318304)
Points: 1