/pkg/p/pdfminer-six/pdfminer-six-banner.webp

pdfminer.six 20240706

PDF parser and analyzer

08-10-2021 272 words 2 minutes 0 views

Contents

PDF parser and analyzer

Stars: 5877, Watchers: 5877, Forks: 928, Open Issues: 245

The pdfminer/pdfminer.six repo was created 10 years ago and the last code push was 2 months ago.
The project is extremely popular with a mindblowing 5877 github stars!

How to Install pdfminer-six

You can install pdfminer-six using pip

pip install pdfminer-six

or add it to a project with poetry

poetry add pdfminer-six

Package Details

Author: Yusuke Shinyama + Philippe Guglielmetti
License: MIT
Homepage: https://github.com/pdfminer/pdfminer.six
PyPi:: https://pypi.org/project/pdfminer.six/
GitHub Repo:: https://github.com/pdfminer/pdfminer.six

Classifiers

Text Processing

No pdfminer-six pypi packages just yet.

Errors

A list of common pdfminer-six errors.

Code Examples

Here are some pdfminer-six code examples and snippets.

GitHub Issues

The pdfminer-six package has 245 open issues on GitHub

Add extras_require in setup.py for PIL, and raise error if not installed when needing PIL
encodingdb.name2unicode(name: str) -> str can't handle type1 font diff like: 2, /'MT110', /'MT50',…
reading order is not quite right formultiple columns in one page
Same sentence is printed three times for a specific PDF file when using pdf2txt
extract images including their textual Figure number/title etc located below the image in a pdf. ie a margin around the image to be captured as well.
extract heading and section headers from pdf…cant acheive this now
Prefer logging to warning
Fix regression in page layout that sometimes returned text lines out of order
Text out of order with pdfminer 20201018
getting lots of (cid:#) instead of readable text
Question: Negative bbox coordinate (x1)
split a multi-page pdf file into multiple pdf files
list index out of range at self.cmap.add_cid2unichr(s1+i, code[i])

See more issues on GitHub

Related Packages & Articles

pdfkit 1.0.0

Wkhtmltopdf python wrapper to convert html to pdf using the webkit rendering engine and qt

pdf2image 1.17.0

A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.

Python tools to analyze security characteristics of MS Office and OLE files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), for Malware Analysis and Incident Response #DFIR

nbconvert 7.16.4

Converting Jupyter Notebooks (.ipynb files) to other formats. Output formats include asciidoc, html, latex, markdown, pdf, py, rst, script. nbconvert can be used both as a Python library (import nbconvert) or as a command line tool (invoked as jupyter nbconvert ...).

Contents

pdfminer.six 20240706

PDF parser and analyzer

How to Install pdfminer-six

Package Details

Classifiers

Errors

Code Examples

GitHub Issues

Related Packages & Articles

pdfminer 20191125

pdfkit 1.0.0

pdf2image 1.17.0

oletools 0.60.2

nbconvert 7.16.4

mwparserfromhell 0.6.6

m3u8 6.0.0

lkml 1.3.5

lief 0.15.1

Tags