Getting http response headers can help us fix the errors when we are crawling a site, you can get these headers by your browser.However, this way is not perfect way for python crawler application. In this tutorial, we will introduce you how to get http response headers using python dynamically.
The md5 value of a string is very useful when you plan to save a string into a database, which can be used as a key primary. In this tutorial, we will introduce how to use python to generate the md5 value of a python string.
When you have got the content of a web page by a python crawler, you should decode html entities so that you can save it into a database. In this tutorial, we will introduce how to encode and decode html entities in a python string.
Python generate random integer, float and string is widely used in python applications, such as generate password, delay time and initialize weights in deep learning. In this tutorial, we will write a simple example to generate them.
Serializing a python object can allow us to save it into a database or transfer it on internet, when we need use it, we also can deserialize it to python object. In this tutorial, we will introduce how to serialize and deserialize a python object.
When you are crawling web page, you may get this error: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0. In this tutorial, we will introduce how to fix this error.
When you are crawling web page, you may find http response return a br content encoding, which means web page is compressed by Brotli algorithm. In this tutorial, we will introduce this compress algorithm and decompress it.
Python 3 urllib is a package that helps us to open urls. It contains four parts:urllib.request, urllib.error, urllib.parse and urllib.robotparser. urllib.request and urllib.parse are most used in python applications, In this tutorial, we will introduce how to crawl web page using python 3 urllib.
In python application, we often use other library to get a python object. However, we often do not know what functions and variables in this object. In this tutorial, we will introduce you a simple way to find python object attributes and functions.
To crawl web page using python, you should know what is http request header. In this tutorial, we simply introduce it and you can learn and set them in your python application.