vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported31 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- (security) Enforce allowed_tools at execution time for Responses API
- Revert "[Render] Add `/derender` endpoints for disaggregated postprocessing" (#43606)
- fix: cache bad_words tokenization to avoid 'Already borrowed' errors under concurrency
- [Core] Expose engine pause/resume state as prometheus metrics
- [v1] Initialize InputBatch in initialize_kv_cache instead of __init__
- [Bugfix] Bounds-check moe_permute reverse-map write (#45492)
- [Bug]: minimax M3MXFP8 with mtp can not start success
- [Security][Rust Frontend] Add input validation to gRPC and HTTP stop_token_ids
- [Feature]: Canvas-aware structured outputs / guided decoding for diffusion language models
- [RFC] RL CI Matrix for vLLM: Behavioral + Physical + Protocol Coverage
- Docs
- Python not yet supported