auto-gptq 0.7.1


An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Stars: 4127, Watchers: 4127, Forks: 425, Open Issues: 236

The AutoGPTQ/AutoGPTQ repo was created 1 years ago and the last code push was 21 hours ago.
The project is very popular with an impressive 4127 github stars!

How to Install auto-gptq

You can install auto-gptq using pip

pip install auto-gptq

or add it to a project with poetry

poetry add auto-gptq

Package Details

GitHub Repo:


No  auto-gptq  pypi packages just yet.


A list of common auto-gptq errors.

Code Examples

Here are some auto-gptq code examples and snippets.

GitHub Issues

The auto-gptq package has 236 open issues on GitHub

  • 2 bit quant Quip
  • xformers integration
  • Will AutoGPTQ support Lora traning for llama2?
  • Apple silicon cannot install the AutoGPT
  • Llama2-70b to autogptq error.
  • [Feature] Modular quantization using AutoGPTQ
  • Add exllama q4 kernel
  • [BUG] Auto_GPTQ is not recognized in python script
  • [BUG] gibberish text inside last oobabooga/text-generation-webui
  • [BUG] RuntimeError: expected scalar type Half but found Float when I try to load LoRA to AutoGPTQ base model
  • Llama 2 70B (with GQA) + inject_fused_attention = "Not enough values to unpack (expected 3, got 2)"
  • [FEATURE] Merge Peft Adapter to base model
  • CUDA extension are not installing
  • [BUG] desc_act is not support for BaiCuan Model?
  • [BUG]torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 18163 is not positive-definite).

See more issues on GitHub

Related Packages & Articles

optimum 1.21.2

Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

farm-haystack 1.26.2

LLM framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data.

deeplake 3.9.14

Deep Lake is a Database for AI powered by a unique storage format optimized for deep-learning and Large Language Model (LLM) based applications. It simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, pdfs, annotations, etc.), querying and vector search, data streaming while training models at scale, data versioning and lineage for all workloads, and integrations with popular tools such as LangChain, LlamaIndex, Weights & Biases, and many more.

beir 2.0.0

A Heterogeneous Benchmark for Information Retrieval