“TypeError: cannot use a string pattern on a bytes-like object” will occur when you are using a byte object in python regular expression. In this paper we will introduce you how to fix this error.
Here is an example.
This example open a url and get html web page content.
import urllib.request with urllib.request.urlopen('http://www.python.org/') as f: html = f.read() print (type(html))
We will get:
<class 'bytes'>
Which means type of html variable is bytes.
Use a regular expression to parse it.
webpage_regex = re.compile('<a[^>]+href=["\'](.*?)["\']',re.IGNORECASE) links = webpage_regex.findall(html) print (links)
We will get error:
The reason for causing this error is html variable is bytes. To fix it, we can decode it.
html = html.decode('utf-8') print (type(html))
Then html is:
<class 'str'>
We can use python regular expression to parse it.
webpage_regex = re.compile('<a[^>]+href=["\'](.*?)["\']',re.IGNORECASE) links = webpage_regex.findall(html) print (links)
The result is:
['http://browsehappy.com/', '#content', '#python-network', '/'