Contents

deepspeed 0.18.6

0

DeepSpeed library

DeepSpeed is a Python package developed by Microsoft that provides a deep learning optimization library designed to scale across multiple GPUs and servers. It is capable of training models with billions or even trillions of parameters, achieving excellent system throughput and efficiently scaling to thousands of GPUs.

DeepSpeed is particularly useful for training and inference of large language models, and it falls under the category of Machine Learning Frameworks and Libraries. It is designed to work with PyTorch and offers system innovations such as Zero Redundancy Optimizer (ZeRO), 3D parallelism, and model-parallelism to enable efficient training of large models.

Stars: 41630, Watchers: 41630, Forks: 4723, Open Issues: 1270

The deepspeedai/DeepSpeed repo was created 6 years ago and the last code push was 19 hours ago.
The project is extremely popular with a mindblowing 41630 github stars!

How to Install deepspeed

You can install deepspeed using pip

pip install deepspeed

or add it to a project with poetry

poetry add deepspeed

Package Details

Author
DeepSpeed Team
License
Apache Software License 2.0
Homepage
http://deepspeed.ai
PyPi:
https://pypi.org/project/deepspeed/
Documentation:
https://deepspeed.readthedocs.io
GitHub Repo:
https://github.com/microsoft/DeepSpeed

Classifiers

No  deepspeed  pypi packages just yet.

Errors

A list of common deepspeed errors.

Code Examples

Here are some deepspeed code examples and snippets.

GitHub Issues

The deepspeed package has 1270 open issues on GitHub

  • [BUG] ZeRO-3: zero.GatheredParameters([multiple params], modifier_rank=None) + in-place slice touch triggers assert not param.ds_active_sub_modules in free_param()
  • [Bugfix] Resolve Rank index out of range during BWD when sp_size < world_size in Ulysses
  • fix: Ensure full gradient reduction for Muon with reduce_scatter
  • [BUG] Cross-partition parameters incorrectly updated when using ZeRO-1/ZeRO-2 with reduce_scatter=true and Muon optimizer
  • Support custom partitioning patterns for AutoTP
  • Enable shm_comm support for arm
  • [Draft] Muon Optimizer Support for ZeRO3
  • [BUG] ZenFlow Stage 3 with full_warm_up_rounds=0 fails due to missing complete_column_offset attribute
  • Fix bf16 dtype mismatch in ZeRO-3 with zero_quantized_weights
  • Fix Muon optimizer conflict with gradient clipping in ZeRO 1/2
  • [BUG] ZeRO-3 with zero_quantized_weights=true incorrectly casts bf16 inputs to fp16, causing BERT training failure
  • [REQUEST] Python types?
  • Fix: ZenFlow Adam integration for updated PyTorch backward flow (#7759)
  • [BUG][Deepcompile] OOM during DeepCompile pre-pass eager node-by-node profiling (FX Interpreter) due to decomposed cross_entropy materializing huge intermediates
  • Introduce all_reduce_hook to support gradient aggregation across replica groups.

See more issues on GitHub

Related Packages & Articles

datasets 4.5.0

HuggingFace community-driven open-source library of datasets

transformers 5.2.0

Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

thinc 9.1.1

A refreshing functional take on deep learning, compatible with your favorite libraries

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.

keras 3.13.2

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. The core data structures of Keras are layers and models. The philosophy is to keep simple things simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code via subclassing).