vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported14 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Bugfix] fix core engine monitor error condition
- [Tracking Issue][Performance]: Speculative decoding performance/QoL improvements
- [Performance]: Timecost on Qwen2.5VL with multi images
- [Bugfix] Fix logprobs to support rank sequences and empty slices
- [Bugfix][PD] correct prefill instance removal bug in examples/disagg_proxy_demo.py
- Fix: Handle NoneType quant_config in FusedMoE LoRA injection
- [ROCm][fusion] enable ROCm rms_norm pattern matching in qk_norm_rope fusion
- [Bug]:llama4 AttributeError: 'dict' object has no attribute 'model_type'
- [Bugfix] Handle layer name inconsistencies in pipeline parallel training
- Support Deepseekv32 chat
- Docs
- Python not yet supported