Python Split and Merge PDF with PyMUPDF: A Completed Guide

By | April 15, 2020

This tutorial is in: Python PDF Document Processing Notes for Beginners

Python can split a big pdf file to some small ones, meanwhile, we also can merge some small pdf files to a big one. In this tutorial, we will introduce how to split and merge pdf files using python pymupdf library.

Preliminary

You should install python pymupdf library first.

pip install pymupdf

Open a source pdf file

To split or merge a pdf file, you should open a source pdf first. To open a pdf file in python pymupdf, we can do like this:

import sys, fitz

file = '231420-digitalimageforensics.pdf'
try:
    doc = fitz.open(file) 
except Exception as e:
    print(e)

page_count  = doc.pageCount
print(page_count)

Run this code, you will find the total page of source document (231420-digitalimageforensics.pdf) is: 199.

Then we can split some pages from the source pdf to a new pdf.

To split or merge pdf files in pymupdf, we can use Document.insertPDF() function.

insertPDF(docsrc, from_page=-1, to_page=-1, start_at=-1, rotate=-1, links=True, annots=True)

This function can select some pages from docsrc to insert into a new pdf.

The index of pages in a pdf document

In python pymupdf, the index of page starts with 0, which means the page index is in [0, total_page – 1].

This is very important if you plan to select some pages from a source pdf file.

Important parameters explain

docsrc: a source pdf file, we can select some page [from_page, to_page].

As to [from_page = 3, to_page = 5], which means we will select 3 pages (page 4, page 5, page 6) from a source pdf.

from_page: int, the start index of page in docsrc.

to_page: int, the end index of page in docsrc, you should notice this index page is also selected.

start_at: int, this parameter determines where to insert pages from docsrc.

For exampe: start_at = 1, which means we will insert pages from docsrc in between page index 0 and page index 1 in destination pdf file.

Menwhile, start_at should be smaller than the total page of destination pdf file.

For example:

doc2 = fitz.open("new-doc-1.pdf")
doc2.insertPDF(doc, from_page = 3, to_page = 5, start_at = 1)
doc2.save("new-doc-4.pdf")

This code will select 3 pages from 231420-digitalimageforensics.pdf. Then, we will insert these pages into the end of first page of new-doc-1.pdf to create a new pdf document new-doc-4.pdf.

This code can split a pdf file and merge two pdf files to a new one.

7 thoughts on “Python Split and Merge PDF with PyMUPDF: A Completed Guide

  1. Arthure Morgan

    Hi

    What should I do if I want to create one pdf from the strings of pdfs generated from svg using drawToString method ?

    1. admin Post author

      It is not easy to create one pdf using some strings or text. In pymupdf, you can use page.insertTextbox().
      I suggest that you can create a html file using strings and convert it to pdf.
      Here is a tutorial:
      https://www.tutorialexample.com/a-simple-guide-to-convert-html-to-pdf-in-python-python-tutorial/
      Moreover, it is set page size for html page. For example, if you want to A4, you can use Pager.css
      https://www.tutorialexample.com/set-web-page-a4-size-with-paper-css-css-framework/

  2. Arthur Morgan

    Sorry I wasnt clear before. I meant a file string that i.e string we recieve from renderPDF.drawToString method. I want to create a pdf file by merging all those pdfs by binary strings. Is it possible ?

Leave a Reply