sentencepiece
https://github.com/google/sentencepiece
C++
Unsupervised text tokenizer for Neural Network-based text generation.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
C++ not yet supported0 Subscribers
Add a CodeTriage badge to sentencepiece
Help out
- Issues
- protobuf version not compatible
- undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs
- Support for s3 files
- Can't view sample code.
- Tutorial to train a cross-language model with sentencepiece
- trainer_interface.cc(356) LOG(WARNING) Empty string found. removed
- Problems when training machine translation with spm
- De-normalization in decode(...)
- Conda installation
- Guidance on how to implement subword sampling at train time
- Docs
- C++ not yet supported