vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported13 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- fix(gguf): Ensure Gemma2 configs have hidden_act for backward compatibility
- fix(gemma2): Add quant_config to embedding layer for GGUF support
- [Bug]: Speculative decode crashes on PP>1 because self.drafter missing
- [Usage]: vllm serve setup issues on B300
- [Usage]: how to load kv cache data into local file
- [Usage]: How can I use the local pre-compiled wheel of vllm
- [Feature]: Support for GGUF qwen3vl models
- Fix Kimi K2 thinking model nvfp4 vocab size
- [Installation]: 在h200部署deepseekv3.2版本库的问题
- [Usage]: missing dsml token "| DSML | " with DeepSeek-V3.2 tools call
- Docs
- Python not yet supported