13:45
2026-05-29
arxiv.org
large-language-models
Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding
Researchers have developed Cassandra, an algorithm-hardware co-designed self-speculative decoding framework that enables lossless acceleration of reasoning large language models on edge devices withouβ¦