Understand Content-Encoding: br and Decompress String – Python Web Crawler Tutorial

By | November 7, 2019

When you are crawling web page, you may find http response return a br content encoding, which means web page is compressed by Brotli algorithm. In this tutorial, we will introduce this compress algorithm and decompress it.

brotli compression algorithm

What is Content-Encoding: br?

It is a format using the Brotli algorithm.

Next we will introduce how to decompress string compressed by Brotli algorithm.

Preliminaries

pip install brotlipy

Load library

import brotli

Create string will be compressd by Brotli algorithm

str = "this is a test tutorial"
str = str.encode("utf-8")

Compress string by by Brotli algorithm

compress_str = brotli.compress(str)
print(compress_str)

The output is:

b'\x0b\x0b\x80this is a test tutorial\x03'

Decompress string

decompress_str = brotli.decompress(compress_str)

Print string, it will be similar to original

print(decompress_str.decode('utf-8'))

The output is:

this is a test tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *