Python Split and Merge PDF with PyMUPDF: A Completed Guide

By | April 15, 2020

This tutorial is in: Python PDF Document Processing Notes for Beginners

Python can split a big pdf file to some small ones, meanwhile, we also can merge some small pdf files to a big one. In this tutorial, we will introduce how to split and merge pdf files using python pymupdf library.

Preliminary

You should install python pymupdf library first.

pip install pymupdf

Open a source pdf file

To split or merge a pdf file, you should open a source pdf first. To open a pdf file in python pymupdf, we can do like this:

import sys, fitz

file = '231420-digitalimageforensics.pdf'
try:
    doc = fitz.open(file) 
except Exception as e:
    print(e)

page_count  = doc.pageCount
print(page_count)

Run this code, you will find the total page of source document (231420-digitalimageforensics.pdf) is: 199.

Then we can split some pages from the source pdf to a new pdf.

To split or merge pdf files in pymupdf, we can use Document.insertPDF() function.

insertPDF(docsrc, from_page=-1, to_page=-1, start_at=-1, rotate=-1, links=True, annots=True)

This function can select some pages from docsrc to insert into a new pdf.

The index of pages in a pdf document

In python pymupdf, the index of page starts with 0, which means the page index is in [0, total_page – 1].

This is very important if you plan to select some pages from a source pdf file.

Important parameters explain

docsrc: a source pdf file, we can select some page [from_page, to_page].

As to [from_page = 3, to_page = 5], which means we will select 3 pages (page 4, page 5, page 6) from a source pdf.

from_page: int, the start index of page in docsrc.

to_page: int, the end index of page in docsrc, you should notice this index page is also selected.

start_at: int, this parameter determines where to insert pages from docsrc.

For exampe: start_at = 1, which means we will insert pages from docsrc in between page index 0 and page index 1 in destination pdf file.

Menwhile, start_at should be smaller than the total page of destination pdf file.

For example:

doc2 = fitz.open("new-doc-1.pdf")
doc2.insertPDF(doc, from_page = 3, to_page = 5, start_at = 1)
doc2.save("new-doc-4.pdf")

This code will select 3 pages from 231420-digitalimageforensics.pdf. Then, we will insert these pages into the end of first page of new-doc-1.pdf to create a new pdf document new-doc-4.pdf.

This code can split a pdf file and merge two pdf files to a new one.

Leave a Reply

Your email address will not be published. Required fields are marked *