
deepspeed 0.18.6
DeepSpeed library
DeepSpeed is a Python package developed by Microsoft that provides a deep learning optimization library designed to scale across multiple GPUs and servers. It is capable of training models with billions or even trillions of parameters, achieving excellent system throughput and efficiently scaling to thousands of GPUs.
DeepSpeed is particularly useful for training and inference of large language models, and it falls under the category of Machine Learning Frameworks and Libraries. It is designed to work with PyTorch and offers system innovations such as Zero Redundancy Optimizer (ZeRO), 3D parallelism, and model-parallelism to enable efficient training of large models.
The deepspeedai/DeepSpeed repo was created 6 years ago and the last code push was 19 hours ago.
The project is extremely popular with a mindblowing 41630 github stars!
How to Install deepspeed
You can install deepspeed using pip
pip install deepspeed
or add it to a project with poetry
poetry add deepspeed
Package Details
- Author
- DeepSpeed Team
- License
- Apache Software License 2.0
- Homepage
- http://deepspeed.ai
- PyPi:
- https://pypi.org/project/deepspeed/
- Documentation:
- https://deepspeed.readthedocs.io
- GitHub Repo:
- https://github.com/microsoft/DeepSpeed
Classifiers
Related Packages
Errors
A list of common deepspeed errors.
Code Examples
Here are some deepspeed code examples and snippets.
GitHub Issues
The deepspeed package has 1270 open issues on GitHub
- [BUG] ZeRO-3: zero.GatheredParameters([multiple params], modifier_rank=None) + in-place slice touch triggers assert not param.ds_active_sub_modules in free_param()
- [Bugfix] Resolve Rank index out of range during BWD when sp_size < world_size in Ulysses
- fix: Ensure full gradient reduction for Muon with reduce_scatter
- [BUG] Cross-partition parameters incorrectly updated when using ZeRO-1/ZeRO-2 with reduce_scatter=true and Muon optimizer
- Support custom partitioning patterns for AutoTP
- Enable shm_comm support for arm
- [Draft] Muon Optimizer Support for ZeRO3
- [BUG] ZenFlow Stage 3 with
full_warm_up_rounds=0fails due to missingcomplete_column_offsetattribute - Fix bf16 dtype mismatch in ZeRO-3 with zero_quantized_weights
- Fix Muon optimizer conflict with gradient clipping in ZeRO 1/2
- [BUG] ZeRO-3 with
zero_quantized_weights=trueincorrectly casts bf16 inputs to fp16, causing BERT training failure - [REQUEST] Python types?
- Fix: ZenFlow Adam integration for updated PyTorch backward flow (#7759)
- [BUG][Deepcompile] OOM during DeepCompile pre-pass eager node-by-node profiling (FX Interpreter) due to decomposed cross_entropy materializing huge intermediates
- Introduce all_reduce_hook to support gradient aggregation across replica groups.
pythonfix







