This tutorial is in: Python PDF Document Processing Notes for Beginners
Python can split a big pdf file to some small ones, meanwhile, we also can merge some small pdf files to a big one. In this tutorial, we will introduce how to split and merge pdf files using python pymupdf library.
Preliminary
You should install python pymupdf library first.
pip install pymupdf
Open a source pdf file
To split or merge a pdf file, you should open a source pdf first. To open a pdf file in python pymupdf, we can do like this:
import sys, fitz file = '231420-digitalimageforensics.pdf' try: doc = fitz.open(file) except Exception as e: print(e) page_count = doc.pageCount print(page_count)
Run this code, you will find the total page of source document (231420-digitalimageforensics.pdf) is: 199.
Then we can split some pages from the source pdf to a new pdf.
To split or merge pdf files in pymupdf, we can use Document.insertPDF() function.
insertPDF(docsrc, from_page=-1, to_page=-1, start_at=-1, rotate=-1, links=True, annots=True)
This function can select some pages from docsrc to insert into a new pdf.
The index of pages in a pdf document
In python pymupdf, the index of page starts with 0, which means the page index is in [0, total_page – 1].
This is very important if you plan to select some pages from a source pdf file.
Important parameters explain
docsrc: a source pdf file, we can select some page [from_page, to_page].
As to [from_page = 3, to_page = 5], which means we will select 3 pages (page 4, page 5, page 6) from a source pdf.
from_page: int, the start index of page in docsrc.
to_page: int, the end index of page in docsrc, you should notice this index page is also selected.
start_at: int, this parameter determines where to insert pages from docsrc.
For exampe: start_at = 1, which means we will insert pages from docsrc in between page index 0 and page index 1 in destination pdf file.
Menwhile, start_at should be smaller than the total page of destination pdf file.
For example:
doc2 = fitz.open("new-doc-1.pdf") doc2.insertPDF(doc, from_page = 3, to_page = 5, start_at = 1) doc2.save("new-doc-4.pdf")
This code will select 3 pages from 231420-digitalimageforensics.pdf. Then, we will insert these pages into the end of first page of new-doc-1.pdf to create a new pdf document new-doc-4.pdf.
This code can split a pdf file and merge two pdf files to a new one.
Hi
What should I do if I want to create one pdf from the strings of pdfs generated from svg using drawToString method ?
It is not easy to create one pdf using some strings or text. In pymupdf, you can use page.insertTextbox().
I suggest that you can create a html file using strings and convert it to pdf.
Here is a tutorial:
https://www.tutorialexample.com/a-simple-guide-to-convert-html-to-pdf-in-python-python-tutorial/
Moreover, it is set page size for html page. For example, if you want to A4, you can use Pager.css
https://www.tutorialexample.com/set-web-page-a4-size-with-paper-css-css-framework/
Thanks alot
Sorry I wasnt clear before. I meant a file string that i.e string we recieve from renderPDF.drawToString method. I want to create a pdf file by merging all those pdfs by binary strings. Is it possible ?
You can read this page:
https://github.com/pymupdf/PyMuPDF/blob/affb1353678f700fcfaf7da79d89e11743a0573a/examples/svg-logo.py
Here
pdfbytes = renderPDF.drawToString(drawing) # turn SVG to PDF image
src = fitz.open("pdf", pdfbytes) # open SVG as a PDF
Then you can operate src as a pdf file.
Okay I will try this.
Thank you so much for helping.
It worked. You are great man. Thanks for helping. God Bless you 🙂