vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported29 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Bug]: In_proj_ba of GDN in Qwen3Next use MergeColumnParallelLinear may cause accuracy decrease?
- fix(distributed): resolve inference failure in cpu_worker
- [Bug]: Docker Build Failure for Dockerfile.nightly_pytorch
- [Bug]: PD disaggregation for SSM models requires `--no-async-scheduling` when TP>1
- [Installation]: Build vllm from source fail
- [Bug] gpt-oss-120b + P-EAGLE speculative decoding causes openai_harmony parse errors and severe chat latency regression
- [RFC][XPU]: Enable Intel XPU CI for vLLM
- [Bug][ARM CPU] Build/Runtime error: no matching function for call to βat::vec::CPU_CAPABILITY::VecMask<long int, 4>::VecMask(int&)β when serving Qwen3-VL-8B-Instruct
- [Scheduler][WIP] Try to reduce preemption
- [Installation]: PyPI release blocks installation on enterprise systems: xgrammar==0.1.29 blocked by security scanners (CVE-2026-25048)
- Docs
- Python not yet supported