
pdfminer.six 20221105
0
PDF parser and analyzer
Contents
PDF parser and analyzer
Stars: 4045, Watchers: 4045, Forks: 790, Open Issues: 128The pdfminer/pdfminer.six
repo was created 8 years ago and was last updated 13 hours ago.
The project is very popular with an impressive 4045 github stars!
How to Install pdfminer.six
You can install pdfminer.six using pip
pip install pdfminer.six
or add it to a project with poetry
poetry add pdfminer.six
Package Details
- Author
- Yusuke Shinyama + Philippe Guglielmetti
- License
- MIT/X
- Homepage
- https://github.com/pdfminer/pdfminer.six
- PyPi
- https://pypi.org/project/pdfminer.six/
- GitHub Repo
- https://github.com/pdfminer/pdfminer.six
Classifiers
- Text Processing
Related Packages
Errors
A list of common pdfminer.six errors.
Code Examples
Here are some pdfminer.six
code examples and snippets.
GitHub Issues
The pdfminer.six package has 128 open issues on GitHub
- Add extras_require in setup.py for PIL, and raise error if not installed when needing PIL
- encodingdb.name2unicode(name: str) -> str can't handle type1 font diff like: 2, /'MT110', /'MT50',…
- reading order is not quite right formultiple columns in one page
- Same sentence is printed three times for a specific PDF file when using pdf2txt
- extract images including their textual Figure number/title etc located below the image in a pdf. ie a margin around the image to be captured as well.
- extract heading and section headers from pdf…cant acheive this now
- Prefer logging to warning
- Fix regression in page layout that sometimes returned text lines out of order
- Text out of order with pdfminer 20201018
- getting lots of (cid:#) instead of readable text
- Question: Negative bbox coordinate (x1)
- split a multi-page pdf file into multiple pdf files
- list index out of range at self.cmap.add_cid2unichr(s1+i, code[i])