vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported21 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [BugFix] Fix mm_encoder_only init for qwen3 vl moe model
- [Core] Add Helix (Context + Tensor) Parallelism
- [Core] Configurable encoder compute and cache budget
- [RFC]:DeepSeek-R1 Moe offload
- [Bug] IndexError: list index out of range in chat_completion_stream_generator with --tool-call-parser=mistral during streaming tool calls
- [Usage]: AssertionError: collective_rpc should not be called on follower node
- [Usage]: how to serve quantized Qwen3-Reranker-8B
- Pass modality information in embed_multimodal
- [Bug]: GLM47 Tool Call Bug
- Add test for Nemotron Nano with LoRA adapters
- Docs
- Python not yet supported