Easton Man's Channel
13:21 · May 5, 2025 · Mon
Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs
https://arxiv.org/abs/2503.23817
arXiv.org
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM...
General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models. Processing-Using-DRAM (PUD), an...
Home
Powered by
BroadcastChannel
&
Sepia