vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [RFC]: Move trainer-side weight transfer logic out of `vllm`
- [Bugfix][MLA] Fix LSE log-base mismatch in DCP + FlashInfer MLA decode
- [Bug]: V1 structured outputs: a malformed grammar request after a valid one crashes EngineCore
- [Core] Improve startup failure diagnostics for early subprocess exits
- [WIP][Model Runner V2][Spec Decode] CUDA graph rejection sampling
- fix: update WNA16 Marlin MoE fake signature
- fix: make FusedMoE expert-map logging meta safe
- [Bugfix] V1: clear stale allowed_token_ids mask in InputBatch.condense
- Support Hybrid&Mamba in kv transfer
- [DSV4]add sequence parallel support for DSV4
- Docs
- Python not yet supported