03:35
2026-06-03
dev.to
large-language-models
Fitting WhisperX large-v3 + a 24B LLM on one 3090: a reproducible context-capping recipe
A developer successfully ran both WhisperX large-v3 (7.7GB) and a 24B parameter LLM (Devstral Small 2) simultaneously on a single 24GB RTX 3090 by reducing the LLM's context window from 40,960 to 8,19โฆ