sentencepiece
https://github.com/google/sentencepiece
C++
Unsupervised text tokenizer for Neural Network-based text generation.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
C++ not yet supported0 Subscribers
Add a CodeTriage badge to sentencepiece
Help out
- Issues
- feat: add riscv64 to Linux wheel build matrix
- sentencepiece sample_encode_fuzzer: sanitize nbest_size/alpha to reduce degenerate paths and timeouts
- Bump the github-actions group with 3 updates
- 32bit WIndows library is packaged into AMD64 package
- Mislabeled asset for v0.2.1
- Suppress the warnings of python module
- Python package does not have license
- Use official abseil-cpp
- 1156 Update license in setup.cfg
- With unigram algorithm, constant piece at end of each sentences does not become a token
- Docs
- C++ not yet supported