vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported18 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- Consolidate AWQ quantization into single awq_marlin.py file
- Revert "[MoE Refactor] Mxfp4 oracle rebased" (#37128)
- [Bugfix] Add size guard to `make_copy_and_call` and improve tests
- [MoE] Move DEEP_GEMM into experts/ subdirectory
- [Feature]: Consolidate GPTQ Quantization
- Revert "[Frontend] Remove librosa from audio dependency" (#37058)
- [CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI
- [Model] Add GGUF support for Qwen3.5 hybrid models
- [Feature]: Unify MoE "Oracles" with Class Structure
- [Bugfix] Share MLA decode output buffer across layers to fix OOM
- Docs
- Python not yet supported