vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported13 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Usage]: Is there a way to inject the grammar into the docker directly
- [Feature]: include reasoning tokens in /v1/messages Anthropic endpoint if model supports it
- [Bug]: set bias=False for weights_proj in deepseek_v2
- [CI Failure]: deepseek-ai/deepseek-vl2-tiny `CUBLAS_STATUS_EXECUTION_FAILED`
- [Feature]: Improve the chat method for offline interface to make it closer to the functionality of online service
- [WIP][ROCm] Add AITER hipblaslt preshuffled gemm kernel
- [Bug]: Load format `runai_streamer_sharded` rejects tunable parameters in `model_loader_extra_config`
- [Bug]: microsoft/phi-2 fails on aarch64
- [InputProcessor] Make external request_id optional
- [Feature]: Skip language model weight load for encoder instance in E-P-D
- Docs
- Python not yet supported