vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported30 Subscribers
View all SubscribersAdd a CodeTriage badge to vllm
Help out
- Issues
- [Bugfix][Tool Parser] Handle non-finite numbers in coerce_to_schema_type
- [Frontend] Warn when VLLM_PLUGINS names a plugin we never discovered
- [Frontend] Add OpenAIBaseModel.get_extra_fields() public accessor
- [Bugfix] Fix crash with PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True set and custom_allreduce enabled
- [Rust Frontend] Support continuous_usage_stats stream option
- [Bug]:
- Add Chimera edit-program example showing 1.8x coding-task speedup
- [Bugfix] Split mixed reasoning/content streaming deltas
- [AMD][Bugfix][Quantization] Honor fused-name match in is_layer_skipped
- [TurboQuant] Add MTP spec-decode routing
- Docs
- Python not yet supported