deepspeed 0.15.2
DeepSpeed library
DeepSpeed
is a Python package developed by Microsoft that provides a deep learning optimization library designed to scale across multiple GPUs and servers. It is capable of training models with billions or even trillions of parameters, achieving excellent system throughput and efficiently scaling to thousands of GPUs.
DeepSpeed
is particularly useful for training and inference of large language models, and it falls under the category of Machine Learning Frameworks and Libraries. It is designed to work with PyTorch
and offers system innovations such as Zero Redundancy Optimizer (ZeRO), 3D parallelism, and model-parallelism to enable efficient training of large models.
The microsoft/DeepSpeed
repo was created 4 years ago and the last code push was Yesterday.
The project is extremely popular with a mindblowing 35068 github stars!
How to Install deepspeed
You can install deepspeed using pip
pip install deepspeed
or add it to a project with poetry
poetry add deepspeed
Package Details
- Author
- DeepSpeed Team
- License
- Apache Software License 2.0
- Homepage
- http://deepspeed.ai
- PyPi:
- https://pypi.org/project/deepspeed/
- Documentation:
- https://deepspeed.readthedocs.io
- GitHub Repo:
- https://github.com/microsoft/DeepSpeed
Classifiers
Related Packages
Errors
A list of common deepspeed errors.
Code Examples
Here are some deepspeed
code examples and snippets.
GitHub Issues
The deepspeed package has 1109 open issues on GitHub
- [BUG]
matmul_ext_update_autotune_table
atexit error - [BUG] Unexpected caculations at backward pass with ZeRO-Infinity SSD offloading
- update ut/doc for glm/codegen
- Multi-node and multi-GPU fine-tuning error: ncclInternalError
- Zero Stage-2 Frozen Layers[BUG]
- [PROBLEM] P2p recv waiting for data will cause other threads under the same process to be unable to perform any operations
- Spread layers more uniformly when using partition_uniform
- Issue with DeepSpeed Inference - Multiple Processes for Model Loading and Memory Allocation
- [BUG] CPU Adam failing
- [BUG] Cannot increase batch size more than 1 with ZeRO-Infinity SSD offloading
- [REQUEST] please provide clear working installation guide
- load linear layer weight with dtype from ckpt
- [QNA] How can i choose adam between fused and cpu?
- Refactor autoTP inference for HE
- [BUG] No runnable example for MoE / PR-MoE GPT inference