PyPDF2 is a python pdf processing library, which can help us to get pdf numbers, title, merge multiple pages. In this tutorial, we will introduce how to extract text from pdf pages. You can do by following our steps.
Install PyPDF2
pip install PyPDF2
Import library
import PyPDF2
Open a pdf file
file =r'F:\google-pdf\1664-Apress.Pro.dotNET.4.Parallel.Programming.in.CSharp.May.2010.pdf' pdfFileObject = open(file, 'rb')
Get a pdf reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
Get pdf page object
pageObject = pdfReader.getPage(0)
In this tutorial, we only get the first page object in pdf file.
Extract text from pdf page object
print(pageObject.extractText())
Close pdf object
pdfFileObject.close()
Then you will see the text extraced from the first page.