vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported14 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [CI] Add non-eager test-case for SharedStorageConnector
- [Bug]: Failed to launch example for Intel Arc Pro B60
- [Bugfix] fix core engine monitor error condition
- [Tracking Issue][Performance]: Speculative decoding performance/QoL improvements
- [Performance]: Timecost on Qwen2.5VL with multi images
- [Bugfix] Fix logprobs to support rank sequences and empty slices
- [Bugfix][PD] correct prefill instance removal bug in examples/disagg_proxy_demo.py
- Fix: Handle NoneType quant_config in FusedMoE LoRA injection
- [ROCm][fusion] enable ROCm rms_norm pattern matching in qk_norm_rope fusion
- [Bug]:llama4 AttributeError: 'dict' object has no attribute 'model_type'
- Docs
- Python not yet supported