19:38
2026-05-29
github.com
large-language-models
Show HN: Tiny-vLLM โ high performance LLM inference engine in C++ and CUDA
A developer has released tiny-vllm, a high-performance LLM inference engine written in C++ and CUDA that serves as a smaller sibling to the vLLM project. The open-source repository includes both the fโฆ