vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported13 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Feature]: Improve the chat method for offline interface to make it closer to the functionality of online service
- [WIP][ROCm] Add AITER hipblaslt preshuffled gemm kernel
- [Bug]: Load format `runai_streamer_sharded` rejects tunable parameters in `model_loader_extra_config`
- [Bug]: microsoft/phi-2 fails on aarch64
- [InputProcessor] Make external request_id optional
- [Feature]: Skip language model weight load for encoder instance in E-P-D
- [Bug]: Kimi-K2-Thinking vLLM self host tool call fail
- [Bug]: Inconsistency between `InternVL3_5-1B` and `InternVL3_5-1B-HF` when running vLLM offline inference
- [CI] Test target determination using LLM
- [Usage]: 大家一般怎么使用vllm/tests的?
- Docs
- Python not yet supported