text-generation-inference
https://github.com/huggingface/text-generation-inference
Python
Large Language Model Text Generation Inference
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to text-generation-inference
Help out
- Issues
- Vision encoder warmup fails with CPU vs CUDA mismatch (F.linear() input_tensor on CPU, weight on CUDA)
- PermissionError: [Errno 13] Permission denied: '/data' when deploying TGI on Hugging Face Spaces
- OpenTelemetry support for http endpoints
- Retrieve the correct cached model batch size in Neuron config checker for Neuron Backend
- Faster (dynamic) grammar compilation
- Deploying Gemma-3-1b-it with NVIDIA GPU P2000 - gets error
- **Add dedicated CPU-only Dockerfile and update documentation for CPU/…
- It seems no one is maintaining this project.
- Endpoint failed to start due to ShardFailed on hugging face inference endpoint
- Gemma3: CUDA error: an illegal memory access was encountered.
- Docs
- Python not yet supported