vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported15 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [New Model]: ByteDance-Seed/BAGEL-7B-MoT
- [Feature]: Qwen3 Models GGUF Support
- [Misc] Feat/trtllm structured output bench test fixes and updates
- [Usage]: gpt-oss-120b tool calls
- [Spec-decode] Refoctor cudagraphs for spec-decode;support uniform_alignment of cudagraph sizes.
- [Quantization] Support pre-load online quantization for compressed-tensors W8A8 channel-wise schema
- Exclude .git from Docker build to Improve layer cache hits
- [RFC]: Kernel Library Restructure / Packaging Split (addressing long build times)
- Add option to disable weakref conversion for last piecewise cudagraph in a module
- Add a flag to use FusedMoE kernel in compressed quantization
- Docs
- Python not yet supported