text-generation-inference
https://github.com/huggingface/text-generation-inference
Python
Large Language Model Text Generation Inference
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to text-generation-inference
Help out
- Issues
- how do I adjust the logging level when launching via the docker container?
- Use pre-built FA2, vllm, quantization kernels in the dockerfiles
- "docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN={your_token} ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard" showing error with my token id that "Unable to find image 'ghcr.io/huggingface/text-generation-inference:latest' locally latest: Pulling from huggingface/text-generation-inference docker: no matching manifest for linux/arm64/v8 in the manifest list entries. See 'docker run --help'."
- Add completion route to client and add stop parameter where it's missing
- Cannot use Inference Endpoint: UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}
- llama3-70B-Instruct-AWQ causing CUDA error: an illegal memory access was encountered
- Refactor layers.
- TGI-2.0.2 encounter "CUDA is not available"
- Encounter install error when install vllm package.
- Mistral7b takes 4 times its size in VRAM on A100
- Docs
- Python not yet supported