vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported25 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- Rocm72 py311 d12
- [ROCm][CI] Move skipped tests out of run-amd-test.sh
- [Bug]: Unsharded model cannot be loaded
- fix(metrics): Prometheus counter crash on negative prompt tokens with external KV transfer
- fix(nixl): Handshake race when same-node workers re-register with new engine IDs
- [Bugfix] Fix MoE routed input transform when using DeepEP LL
- [Bugfix] Fix GGUF parameter mapping for Transformers v5 fused MoE experts
- [Bug]: Gemma 4 torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors
- [Bug]: Potential misalignment between qwen3.5 chat template and recommended tool parser
- [Bug]: Gemma 4 E4B weight loading fails `Gemma4ClippableLinear` parameter `input_max` not recognized
- Docs
- Python not yet supported