- Data Extraction
Extracting information from PDF documents
PDFMiner is an open source tool for extracting text information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. It allows you to obtain the exact location of text on a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.