Hey all,
I built slopsome.com to answer the question I kept re-deriving by hand: will model X run on GPU Y at quant Q with a Z-token context, and how fast?
It’s a search engine for LLM + GPU stats: a VRAM fit-calculator (fits in VRAM / with offload / multi-GPU / won’t fit + estimated tok/s), real measured throughput, and side-by-side compares of open-weight and API models (params, quant sizes, min VRAM, benchmarks, cost). Built for the GGUF / llama.cpp / Ollama / vLLM crowd. Free, no signup, sourced data (no invented numbers).
There’s also an open read-only API and a small HF Space demo. Feedback very welcome - wrong numbers, missing models/GPUs, features you’d want.
Try the demo Space: slopsome — Will It Fit? - a Hugging Face Space by NexAIGuy Great! Can you add multiple GPU capability? Eg. I have 2x 5060ti GPUs