text-generation-inference
https://github.com/huggingface/text-generation-inference
Python
Large Language Model Text Generation Inference
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to text-generation-inference
Help out
- Issues
- Gemma3: CUDA error: an illegal memory access was encountered
- Separate Health and Ready Endpoints
- New metric `tgi_kv_cache_usage`
- Error when launching Magistral-Small-2506
- Neuron backend may select wrong batch size for cached model
- ModuleNotFoundError: No module named 'punica_sgmv'
- Support for gpt-oss-120b and gpt-oss-20b model.
- feat: expose GPU energy consumption (mJ) in responses
- Infinite tool call loop: `HuggingFaceModel` and `text-generation-inference`
- How to use prefix caching
- Docs
- Python not yet supported