vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Kernel] TD operand loads for batched MoE GEMM (moe_mmk) on XPU
- test: run MoRIIO layout geometry on CPU
- [rust][tool-parser] Add step3 tool parser
- fix: make NCCL collectives eager break points in breakable cudagraph
- [Bugfix][ROCm] Preserve MoE weight padding for unquantized Triton path
- [Quantization][INC] Support hybrid INT4+FP8 AutoRound checkpoints via maybe_update_config
- [Profiler] Add execution trace capture to torch profiler config
- [Bug]: Changing VLLM_CPU_KVCACHE_SPACE drops Qwen 3.5 accuracy on AMD EPYC CPU
- [Frontend] Add TLS support with certificate/key files
- [BugFix] Report correct cached_tokens for disaggregated prefill
- Docs
- Python not yet supported