vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported29 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Bugfix] Add missing f-string prefixes in assert messages
- Fix "Dynamo bytecode transform" span start time calculation
- [Feature] Migrate DP Supervisor from Python to Rust
- Handle chat template render validation errors
- fix(qwen3-asr): respect language parameter in get_generation_prompt
- fix(structured_output): pass new_token_ids to should_advance() to fix MTP spec-decode off-by-one
- [Bugfix][MoE][SpecDecode] Fix Qwen3.5/3.6 MTP loader for pre-fused expert checkpoints
- Fix DP drafter hang near max model length
- [KV Connector][Mooncake] Add store group semantics
- fix: [Transformers v5] Base model and LoRA used in test has incorrect `tok...
- Docs
- Python not yet supported