# A Simple Guide to Python Detect PDF File is Corrupted or Incompleted – Python Tutorial

By | September 19, 2019

When we are processing pdf files with python, we should check a pdf is completed or corrupted. In this tutorial, we will introduce you a simple way to how to detect. You can use this tutorial example in your application.

## Some features of completed pdf files

PPF file 1.

The pdf file ends with NUL. Meanwhile, there are many NUL in last line.

The last second line contains: %%EOF

At the middle of this pdf file, there are also a %%EOF.

PDF file 2.

This pdf file ends with NUL, there are only a NUL in the last line.

The last second line also contains a %%EOF.

PDF file 3.

The pdf file ends with unknown symbol. However, the last second line contains a %%EOF.

PDF file 4.

This pdf file ends with %%EOF.

Then check the start of pdf

PDF file 5.

So as to a completed pdf, the feature of it is:

1.The pdf file ends with %%EOF or NUL.

2.This file contain more than one %%EOF symbol.

3. The content of pdf file contains %PDF.

We can create a python function to detect a pdf file is completed or not.

def isFullPdf(f):
end_content = ''
start_content = ''
size = os.path.getsize(f)
if size < 1024: return False
with open(f, 'rb') as fin:
#start content
fin.seek(0, 0)
start_content = start_content.decode("ascii", 'ignore' )
fin.seek(-1024, 2)
end_content = end_content.decode("ascii", 'ignore' )
start_flag = False
#%PDF
if start_content.count('%PDF') > 0:
start_flag = True

if end_content.count('%%EOF') and start_flag > 0:
return True
eof = bytes([0])
eof = eof.decode("ascii")
if end_content.endswith(eof) and start_flag:
return True
return False

I have test this function on more than 1,000 pdf files, it works well.