vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported30 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [ROCM] Fix the AITER FA SWA Decode Path
- [Bugfix] Dereference $ref and flatten anyOf in tool schemas before chat templates
- [RFC]: Does SimpleCPUOffloadConnector have plans to support disk/ssd?
- [Feature]: Support PD disaggregation / KV transfer for hybrid SSM/GDN models such as Qwen3.5-397B-A17B-W8A8
- [Tracking][NUMA] Replace hard-coded Granite Rapids PCT detection with a generic, root-free path
- [Rust Frontend] Support OpenAI n choices
- [Feature]: Support for orthrus
- [Frontend] Add KV cache runtime state retrieval endpoint
- [SpecDec + Reasoning] Fix race condition when <channel|> reasoning-end
- [PD] NixlPush mode with scheduler step alignment
- Docs
- Python not yet supported