vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported26 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Bug]: Forced tool_choice crashes with AssertionError when reasoning_parser consumes all content
- [Bug]: Worse EAGLE3 acceptance rates on MRV2
- [AMD][CI Failure][Tracker] Static dashboard tracker for current CI failures
- [SpecDecode] Allow draft-specific attention backend and KV dtype
- [SpecDecode] Add local argmax helper for Llama Eagle3 drafts
- [Core][WIP] Check for GPU<->CPU sync during CI
- [Gemma4][Bugfix]: add missing get_expert_mapping
- [Bugfix] Remove duplicate size_k divisibility check in get_moe_wna16_block_config
- [ROCm] Enable SimpleCPUOffloadConnector on ROCm
- [Test][Bugfix] Fix LRUCache.touch() state corruption and expand test coverage
- Docs
- Python not yet supported