Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs | Easton Man's Channel

13:21 · May 5, 2025 · Mon

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs https://arxiv.org/abs/2503.23817

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM...

General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models. Processing-Using-DRAM (PUD), an...

Powered by BroadcastChannel & Sepia