vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported30 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [ROCm] Enable AITER and FP8 inference on GFX120x
- [BugFix] Fix Mistral3 multimodal offline example
- [Kernel] Warm up hybrid GDN/Mamba/MRoPE kernels
- [Bugfix] Add SWIGLUSTEP and RELU2 activation support to CPU fused MoE
- [Bugfix] Enable audio transcription endpoint for Gemma 4
- [Bugfix] Set use_mha=False in EagleMistralLarge3Model.__init__
- [New Model]: Ovis2_6_NextForCausalLM (AIDC-AI/Ovis2.6-80B-A3B)
- [XPU] Add W8A8 FP8 linear kernel with multi-granularity quant support
- [Core] Support opt-in custom logits processors with speculative decoding
- [perf] [xgrammar] Import structural tag from `xgrammar`'s builtin structural tag templates
- Docs
- Python not yet supported