Improve NLTK Word Lemmatization with Parts-of Speech – NLTK Tutorial

By | August 30, 2019

In prevoius tutorial, we learn how to lemmatize a word in nltk, however, it is not perfect. In this tutorial, we will apply word part-of-speech to improve it.


Before our tutorial, you should read these basic tutorial.

An introduction to word lemmatization in nltk

Implement Word Lemmatization with NLTK for Beginner – NLTK Tutorial

An introduction to nltk word part-of-speech tagging

A Simple Guide to NLTK Tag Word Parts-of-Speech – NLTK Tutorial

Improve nltk word lemmatization with word part-of-speech

Import libraries

import nltk
from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize, pos_tag
from nltk.corpus import wordnet

Get the type of word part-of-speech

def get_wordnet_pos(treebank_tag):
    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
        return None

In this function, we only process noun, verb, adjective and adverb, you can change this function to enhance its functionality.

Get word lemmatization based on word part-of-speech

def lemmatize_sentence(sentence):
    res = []
    lemmatizer = WordNetLemmatizer()
    # get word and its pos
    for word, pos in pos_tag(word_tokenize(sentence)):
        wordnet_pos = get_wordnet_pos(pos) or wordnet.NOUN
        res.append(lemmatizer.lemmatize(word, pos=wordnet_pos))

    return res

The key of this function is:

lemmatizer.lemmatize(word, pos=wordnet_pos)

This function can get word lemmatization based on word part-of-speech.

Print the result


The result is : do

However, if you do not use word part-of-speech to improve, you will get: done

Leave a Reply

Your email address will not be published. Required fields are marked *