vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported25 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [feat] Support modelopt_mixed for Turing and Ampere via Marlin
- [Bugfix][cmake] fix FP4 ARCH for CUDA>=13.0
- [ROCm][Perf] Add AITER MLA prefill kernel for dense MLA backend
- Rocm72 py311 d12
- [ROCm][CI] Move skipped tests out of run-amd-test.sh
- [Bug]: Unsharded model cannot be loaded
- fix(metrics): Prometheus counter crash on negative prompt tokens with external KV transfer
- fix(nixl): Handshake race when same-node workers re-register with new engine IDs
- [Bugfix] Fix MoE routed input transform when using DeepEP LL
- [Bugfix] Fix GGUF parameter mapping for Transformers v5 fused MoE experts
- Docs
- Python not yet supported