Contents

ocrmypdf 16.5.0

0

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Stars: 13849, Watchers: 13849, Forks: 1007, Open Issues: 114

The ocrmypdf/OCRmyPDF repo was created 10 years ago and the last code push was 3 weeks ago.
The project is extremely popular with a mindblowing 13849 github stars!

How to Install ocrmypdf

You can install ocrmypdf using pip

pip install ocrmypdf

or add it to a project with poetry

poetry add ocrmypdf

Package Details

Author
None
License
MPL-2.0
Homepage
None
PyPi:
https://pypi.org/project/ocrmypdf/
Documentation:
https://ocrmypdf.readthedocs.io/
GitHub Repo:
https://github.com/ocrmypdf/OCRmyPDF

Classifiers

  • Scientific/Engineering/Image Recognition
  • Text Processing/Indexing
  • Text Processing/Linguistic
No  ocrmypdf  pypi packages just yet.

Errors

A list of common ocrmypdf errors.

Code Examples

Here are some ocrmypdf code examples and snippets.

GitHub Issues

The ocrmypdf package has 114 open issues on GitHub

  • [Feature]: Switch to remove images?
  • [Bug]: pdfa-image-compression=auto behaviour violates the principle of least surprise w.r.t. lossy/lossless optimisations
  • Confused about –unpaper-args
  • [Bug]: PDF/A-3B files generated with a widely used commercial encoder generate garbage OCR content
  • [Feature]: OCR on pages with multiple text rotations
  • 鉴于很多使用者不会配置环境,我们在OCRmyPDF的基础上,集成了所需环境,并使用Electron开发了桌面端 [Electron version of OCRmyPDF]
  • [BUG] Frequently seeing Syntax Error (91811): Too few (2) args to 'cm' operator
  • [BUG] 'DecompressionBombError' on a ACM PDF - need resolution limit on high DPI
  • [BUG] Bold font in PDF is replaced by black bars
  • [BUG] ghostscript fails due to small resolution value
  • Snap package shouldn't ship all of the Tesseract OCR language files
  • Only generate text files without generating PDF files
  • Feature Request: GPU OCR pipeline e.g. via EasyOCR
  • extra space in the result pdf when the input pdf is in Chinese
  • Azure ocr with ocrmypdf

See more issues on GitHub

Related Packages & Articles

easyocr 1.7.2

End-to-End Multi-Lingual Optical Character Recognition (OCR) Solution

deeplake 3.9.26

Deep Lake is a Database for AI powered by a unique storage format optimized for deep-learning and Large Language Model (LLM) based applications. It simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, pdfs, annotations, etc.), querying and vector search, data streaming while training models at scale, data versioning and lineage for all workloads, and integrations with popular tools such as LangChain, LlamaIndex, Weights & Biases, and many more.

pyvips 2.2.3

binding for the libvips image processing library, API mode

rpa 1.50.0

RPA for Python is a Python package for RPA (robotic process automation)