vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported19 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Quant] add CompressedTensorsW8A8Mxfp8 for linear and MoE layers
- [ROCm] Enable fused_silu_mul_block_quant on ROCm
- [Bug] Add e_score_correction_bias to SKIP_TENSORS
- [Transformers v5] Fix Ernie4_5_VLMoeForConditionalGeneration rope_theta config
- [Feature]: General LL GEMMs with PDL Support
- [LongCat flash] Fix `ZeroExpertFusedMoE` missing `select_experts()` in router and MTP fix
- [7/n] libtorch stable ABI
- [XPU][CI] Add misc cases on Intel GPU in CI
- [Bugfix] Fix test mocks after SM100 restriction in #38730
- [XPU] Fix MoE hang in test_external_lb_dp by handling restricted device visibility
- Docs
- Python not yet supported