vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported18 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [ROCm][Perf] Add MXFP4 linear method and enable shared expert fusion
- [MoE] Unify MoE oracles with class structure
- [Feature] Add track_token_ids for efficient selective token logprobs tracking
- [UX] Logging - Improve Startup Error Logs
- Readability cleanup for wvSplitK reduces.
- [Bug]: V1 engine core deadlocks under concurrent load (fp8 + prefix caching + Qwen3.5)
- [Bug] Garbage output for long prompts after #35216
- [Bug]: Missing logprobs for `<tool_call>` in streaming chat completions
- [Bug]: Mooncake Connector: Decode nodes stuck in WAITING_FOR_REMOTE_KVS after Prefill node restart
- [Bug] prompt_logprobs causes livelock with IsHybrid models (Qwen3.5) in DP mode
- Docs
- Python not yet supported