vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported13 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [CI Failure]: mi325_1: ROCm GPT-OSS Eval
- [CI Failure]: tests/integration/test_rl.py: RuntimeError: operator torchvision::nms does not exist
- [Bug]: LoRA adapters with mismatched module name prefixes silently produce base-model output
- [Bug]: Set env ROCP_TOOL_ATTACH=1 caused vllm server stopped
- add log of raw request when crashing
- [Bug]: [CPU Backend] Whisper W8A8 CPU utilization very low on Arm CPU
- [Feature]: Enable CUDA graph capture for Eagle speculator prefill
- fix(v1-worker): improve dcp assertion with backend fallback hint (#28407)
- [Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8)
- Minor cleanup for Voxtral
- Docs
- Python not yet supported