vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported25 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Bugfix] Fix hybrid KV manager for quantized per-token-head KV cache
- Revert "Fix MoE backend selection for LoRA (unquantized MoE)" (#40273)
- [Feat][KVConnector] Prepend offloaded blocks on offloading complete for lazy mode in simple cpu offloader
- [ROCm] Allow Triton MXFP4 MoE support checks on gfx11xx
- [Bug]: MoRI Connector hangs at >=128 concurrency
- [Docs] [Misc] add sig list table in community governance process
- [Bug]: MTP draft head TP allgather deadlock under sustained long-context load (GLM-5.1-FP8)
- [Kernel][Bugfix] Marlin W4A16: pad sub-tile output dims on load
- fix: remove redundant None default in dict.get() calls
- [Doc] Fix CLI help examples: remove phantom --help=listgroup and --help=page modes
- Docs
- Python not yet supported