vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- fix: [Bug] V1 InputBatch condense can leak stale allowed_token_ids mask to recycled row
- Kimi K2.5/2.6 LoRA adapter loading
- [Frontend] Add hardware device retrieval endpoint
- [Bugfix]: preserve DeepSeek V4 ubatch metadata for DBO prefills
- [Bugfix] Align MLA indexer block table with MTP speculative decode
- [Test][KVConnector] Add test coverage and benchmark for on_new_request() hook
- [Bugfix] Skip cancelled requests in async tokenizer
- fix: add buffer-length check in hf3fs_utils.cpp
- [Bugfix][Tool Parser] Handle non-finite numbers in coerce_to_schema_type
- [Frontend] Warn when VLLM_PLUGINS names a plugin we never discovered
- Docs
- Python not yet supported