Python - 词干提取算法
在自然语言处理领域,我们遇到过两个或多个词有共同词根的情况。 例如,agreed、agreeing 和 agreeable这三个词具有相同的词根 agree。 涉及这些词中的任何一个的搜索应将它们视为同一个词,即根词。 因此,将所有单词链接到它们的词根变得至关重要。 NLTK 库具有执行此链接并提供显示词根的输出的方法。
nltk 中提供了三种最常用的词干提取算法。 他们给出的结果略有不同。 下面的示例显示了所有三种词干提取算法的使用及其结果。
import nltk from nltk.stem.porter import PorterStemmer from nltk.stem.lancaster import LancasterStemmer from nltk.stem import SnowballStemmer porter_stemmer = PorterStemmer() lanca_stemmer = LancasterStemmer() sb_stemmer = SnowballStemmer("english",) word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" # First Word tokenization nltk_tokens = nltk.word_tokenize(word_data) #Next find the roots of the word print '***PorterStemmer****\n' for w_port in nltk_tokens: print "Actual: %s || Stem: %s" % (w_port,porter_stemmer.stem(w_port)) print '\n***LancasterStemmer****\n' for w_lanca in nltk_tokens: print "Actual: %s || Stem: %s" % (w_lanca,lanca_stemmer.stem(w_lanca)) print '\n***SnowballStemmer****\n' for w_snow in nltk_tokens: print "Actual: %s || Stem: %s" % (w_snow,sb_stemmer.stem(w_snow))
当我们运行上面的程序时,得到以下输出 −
***PorterStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famou Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: hi Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: hi Actual: subalterns || Stem: subaltern ***LancasterStemmer**** Actual: Aging || Stem: ag Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: fam Actual: crime || Stem: crim Actual: family || Stem: famy Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transf Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: on Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern ***SnowballStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famous Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern