PyMuPDF RuntimeError: cycle in page tree will occur when you are iterating pdf page by page. In this tutorial, we will show you how to fix this problem.
Example Code:
import sys, fitz pdf = "F:\\114848.pdf" doc = fitz.open(pdf) for page in doc: text = page.getText("text") html_text = page.getText("html") #print(text) #print(html_text)
This code will report runtime error: cycle in page tree
Locate the error page
page_num = 0 for page in doc: page_num += 1 print(page_num) text = page.getText("text") html_text = page.getText("html")
From the result, we can find the page 110 report error.
Check the pdf file, we find this page is ok, however, the next page 111 is something wrong: nothing is in 111 page.
To fix this error, we can add try except statement.
Fix code example as below:
try: for page in doc: page_num += 1 print(page_num) text = page.getText("text") html_text = page.getText("html") #print(text) #print(html_text) except Exception as e: print(e) print("end")