vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported30 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [NIXL] Add CacheLayout meta-tensor abstraction for descriptor generation
- [XPU][CI] add intel xpu cases for nightly CI
- [Bug]: [v0.22] Crash when calling API to inference to a GGUF model
- Refactor RMSNorm vectorized launch checks
- [Bugfix] Add @register_fake for dsv3_fused_a_gemm
- [Installation]: hint: `fastsafetensors` (v0.3.2) was included because `vllm` (v0.22.1rc1.dev123+g0e2b13103.d20260603) depends on `fastsafetensors`
- [Multimodal] Add Qwen3-VL video loader
- [Feature]: Streaming input for VLM models
- [Bugfix] Fix NixlEPAll2AllManager's dependency on --enable-elastic-ep to function
- [XPU] skip UT test_with_ngram_gpu_spec_decoding
- Docs
- Python not yet supported