vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- fix: update WNA16 Marlin MoE fake signature
- fix: make FusedMoE expert-map logging meta safe
- [Bugfix] V1: clear stale allowed_token_ids mask in InputBatch.condense
- Support Hybrid&Mamba in kv transfer
- [DSV4]add sequence parallel support for DSV4
- [Bugfix][DeepSeek-V4] Append generation prompt when last message is system
- [ROCm]: Docker/CMake build support for gfx1103 (Radeon 780M / RDNA3 APU)
- [Performance] Reduce DeepEP LL batched Marlin activation overhead
- DO NOT MERGE
- kv_offload: eviction-triggered store for OffloadingConnector
- Docs
- Python not yet supported