vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Bugfix][DeepSeek-V4] Append generation prompt when last message is system
- [ROCm]: Docker/CMake build support for gfx1103 (Radeon 780M / RDNA3 APU)
- [Performance] Reduce DeepEP LL batched Marlin activation overhead
- DO NOT MERGE
- kv_offload: eviction-triggered store for OffloadingConnector
- fix: [Bug] V1 InputBatch condense can leak stale allowed_token_ids mask to recycled row
- Kimi K2.5/2.6 LoRA adapter loading
- [Frontend] Add hardware device retrieval endpoint
- [Bugfix]: preserve DeepSeek V4 ubatch metadata for DBO prefills
- [Bugfix] Align MLA indexer block table with MTP speculative decode
- Docs
- Python not yet supported