vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported13 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Bug]: Kimi-K2-Thinking vLLM self host tool call fail
- [Bug]: Inconsistency between `InternVL3_5-1B` and `InternVL3_5-1B-HF` when running vLLM offline inference
- [CI] Test target determination using LLM
- [Usage]: 大家一般怎么使用vllm/tests的?
- fix(gguf): Ensure Gemma2 configs have hidden_act for backward compatibility
- fix(gemma2): Add quant_config to embedding layer for GGUF support
- [Bug]: Speculative decode crashes on PP>1 because self.drafter missing
- [Usage]: vllm serve setup issues on B300
- [Usage]: how to load kv cache data into local file
- [Usage]: How can I use the local pre-compiled wheel of vllm
- Docs
- Python not yet supported