Three ways to shrink an LLM — scale the salient weights, compensate the rounding with second-order math, or train ternary so the matmul becomes addition.
A clear, side-by-side comparison with examples — part of Rudrite Research.
source & further reading
research.rudrite.com — original article
Shuchen Xue
Json Zhou
Yunze Man