vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported29 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Benchmark] Add iteration benchmark with server-side step stats, trac…
- [LoRA] Support FP8 LoRA E2E inference-dense model
- [ROCm][Perf] Skip head repeat_interleave for AITER MLA decode with BF16 KV cache
- [Doc] Add comprehensive --speculative-config documentation
- Speculative/MTP draft config appears to drop target --hf-overrides (breaks long-context YaRN/RoPE extension)
- [Bugfix][Core] Preserve target hf_overrides in MTP draft config
- [Usage]: Failed to run Qwen3 Eagle3 speculate
- fix marlin fp4 kernel N-dimension alignment
- [Kernel] Add Llama4 Router GEMM kernel
- [CI] Update model registry with real HF model IDs for CI testing
- Docs
- Python not yet supported