#1
MoE Performance, Quantization, and Backend Stability Fixes
35 条 issue · 4.5 需求得分 · Performance
Users are reporting critical issues with Mixture of Experts (MoE) model performance including significant decode throughput regressions, quantization-related accuracy problems with new models like Gemma 4 and Qwen3, and CUDA/ROCm backend stability issues causing crashes and hangs. These fixes are essential for running large-scale MoE deployments reliably and efficiently.