vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Test][KVConnector] Add test coverage and benchmark for on_new_request() hook
- fix: add buffer-length check in hf3fs_utils.cpp
- [Frontend] Warn when VLLM_PLUGINS names a plugin we never discovered
- [Frontend] Add OpenAIBaseModel.get_extra_fields() public accessor
- [Bugfix] Fix crash with PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True set and custom_allreduce enabled
- [Bug]:
- Add Chimera edit-program example showing 1.8x coding-task speedup
- [Bugfix] Split mixed reasoning/content streaming deltas
- [ROCm][Perf] DSv3.2: fuse indexer Q-RoPE+quant + K-norm/RoPE/quant/cache
- [ROCm][DSV4] Use aiter mHC pre/post as the default ROCm path
- Docs
- Python not yet supported