vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported18 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- support firered_aed_l model
- [Perf] Support Flashinfer trtllm tinygemm_bf16 router gemm for GPT-OSS
- [Model] Implement LoRA support for Qwen3ASRForConditionalGeneration
- [Bugfix] Handle truncate_prompt_tokens in Harmony (GPT-OSS) path
- [Core] Add register_model() to KVConnectorBase_V1 for CacheBlend
- [Bugfix] AsyncLLM: Add the ability to specify the pooling_task
- Automatically add links to API docs for matching strings in docs
- [Bugfix] fix NoneType error in KV cache transfer with NCCL connector for DeepSeek
- [Bugfix] [Frontend] Responses API, fix merging of message and tool call
- [Models][GDN] Prevent D2H sync in `ChunkGatedDeltaRule`
- Docs
- Python not yet supported