vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported14 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Bugfix] Fix shellcheck hook find command syntax error
- [Bug]: Gemma3/SiglipVisionEmbeddings embedding output is different to transformers implementation due to custom Conv2d
- [P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes
- [Feature] Support LoRA MoE for bitsandbytes quantization
- Fix gpt‑oss Harmony token leaks in tool names and streaming content #32587
- [Bug]: Invalid base64-encoded string for audio input
- [ROCm][Deepseekv3.2][Perf] dsv3.2 further optimization on vllm
- [Quantization][Deprecation] Remove PTPC FP8
- [Bugfix] Fix MoE Model DP+TP with NaiveAll2AllManager Bug
- [MLA][DeepSeek] Add VLLM_MLA_FP8_PROJ to force FP8 for MLA q_b_proj layer
- Docs
- Python not yet supported