vLLM

产品快照

生成日期: 2026-04-06 分析 Issue 数: 35 需求簇: 1

35已分析 Issue

35纳入排序

1需求簇

2026-04-06更新时间

头号需求

上升需求

36.0x

主导分类

重点需求

#1 MoE Performance, Quantization, and Backend Stability Fixes
35 条 issue · 4.5 需求得分 · Performance

Users are reporting critical issues with Mixture of Experts (MoE) model performance including significant decode throughput regressions, quantization-related accuracy problems with new models like Gemma 4 and Qwen3, and CUDA/ROCm backend stability issues causing crashes and hangs. These fixes are essential for running large-scale MoE deployments reliably and efficiently.