vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported25 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- Skip reasoning parsing when using continue_final_message
- [6/n] Migrate activation kernels, gptq, gguf, non cutlass w8a8 to libtorch stable ABI
- [GDN] Fused all preprocessing into one kernel for chunked stage
- Change `trust_remote_code` default in test runners
- [Fix] Align MoRIIO registration format with vLLM router and handle de…
- Fix/p2p request id mismatch
- [Transformers v5] Fix NemotronParse image_size tuple unpack
- Splitting MLA attention Triton kernel
- [vLLM IR][RMSNorm] Port RMSNormGated to vLLM IR Ops
- Fix sarvam forward compatibility with transformers v5
- Docs
- Python not yet supported