Fix Python Beautiful Soup Tag .string is None: A Completed Guide – Python Tutorial

By | April 5, 2020

We often use python BeautifulSoup package to parse a html page to get html tags. However, the tag .string attribution often return None. In this tutorial, we will use some examples to how you how to fix this problem.

Parse a html page by BeautifulSoup

Here is an example:

from bs4 import BeautifulSoup

html_content = '<html><div><span>Tutorial Example</span> https://www.tutorialexample.com</div></html>'

soup = BeautifulSoup(html_content, "html.parser")

Parse a html string and get all div tags

tags = soup.find_all('div')

Output the content of each div tag

for tag in tags:
    print(tag.string)

We will plan to use .string attribution to output the text in each div tag.

Run this python code, you will get this result: None

Why does .string return None?

As to this example, the .string attribution of each div tag which contains only 0 or one html tag can not return None.

If the html is:

html_content = '<html><div>https://www.tutorialexample.com</div></html>'

There is not any html tag in html div tag, then

for tag in tags:
    print(tag.string)

The result will be: https://www.tutorialexample.com

Moreover, if the html is:

html_content = '<html><div><span>https://www.tutorialexample.com<span></div></html>'

There are only one html tag span in each div. The result is also be: https://www.tutorialexample.com

As to this html:

html_content = '<html><div><span>Tutorial Example</span> <span>https://www.tutorialexample.com<span></div></html>'

There are two span tags in div tag, the .string of each div tag is None.

How to get the text in div tag if .string is None?

We can use .text attribution. Here is an example:

for tag in tags:
    print(tag.text)

The text in html div tag is:

Tutorial Example https://www.tutorialexample.com

Leave a Reply

Your email address will not be published. Required fields are marked *