pdfminer.six 20240706
0
PDF parser and analyzer
Contents
PDF parser and analyzer
Stars: 5877, Watchers: 5877, Forks: 928, Open Issues: 245The pdfminer/pdfminer.six
repo was created 10 years ago and the last code push was 2 months ago.
The project is extremely popular with a mindblowing 5877 github stars!
How to Install pdfminer.six
You can install pdfminer.six using pip
pip install pdfminer.six
or add it to a project with poetry
poetry add pdfminer.six
Package Details
- Author
- Yusuke Shinyama + Philippe Guglielmetti
- License
- MIT
- Homepage
- https://github.com/pdfminer/pdfminer.six
- PyPi:
- https://pypi.org/project/pdfminer.six/
- GitHub Repo:
- https://github.com/pdfminer/pdfminer.six
Classifiers
- Text Processing
Related Packages
Errors
A list of common pdfminer.six errors.
Code Examples
Here are some pdfminer.six
code examples and snippets.
GitHub Issues
The pdfminer.six package has 245 open issues on GitHub
- Add extras_require in setup.py for PIL, and raise error if not installed when needing PIL
- encodingdb.name2unicode(name: str) -> str can't handle type1 font diff like: 2, /'MT110', /'MT50',…
- reading order is not quite right formultiple columns in one page
- Same sentence is printed three times for a specific PDF file when using pdf2txt
- extract images including their textual Figure number/title etc located below the image in a pdf. ie a margin around the image to be captured as well.
- extract heading and section headers from pdf…cant acheive this now
- Prefer logging to warning
- Fix regression in page layout that sometimes returned text lines out of order
- Text out of order with pdfminer 20201018
- getting lots of (cid:#) instead of readable text
- Question: Negative bbox coordinate (x1)
- split a multi-page pdf file into multiple pdf files
- list index out of range at self.cmap.add_cid2unichr(s1+i, code[i])