deepspeed
https://github.com/microsoft/deepspeed
Python
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported17 Subscribers
Add a CodeTriage badge to deepspeed
Help out
- Issues
- [BUG]Backward through the graph twice in training
- [BUG] training bug
- [BUG]`assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary()` when training deepspeed-chat step3 with ZeRO3 and a larger `generation_batches`
- When using zero stage 3 for model training, loading custom parameters failed and the model parameter size was 0.
- [REQUEST] During inference, support passing `past_key_values` even if `input_ids.shape[-1] >= 2`
- [BUG] StarCoder inference not working with AutoTP
- [BUG] process exits with return code=-6 during training with bf16 optimizer
- [BUG] Deepspeed inference time distribution and max tokens
- [BUG] ZeRO Stage 2 seems to train MoE models incorrectly
- Fixed bug with hybrid engine generation when inference_tp_size > 1
- Docs
- Python not yet supported