Tagging parts-of-speech#

This tutorials shows you how to do part-of-speech tagging in Flair, showcases univeral and language-specific models, and gives a list of all PoS models in Flair.

Language-specific parts-of-speech (PoS)#

Syntax is fundamentally language-specific, so each language has different fine-grained parts-of-speech. Flair offers models for many languages:

… in English#

For English, we offer several models trained over Ontonotes.

Use like this:

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('pos')

# make a sentence
sentence = Sentence('Dirk went to the store.')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print:

Sentence[6]: "Dirk went to the store." → ["Dirk"/NNP, "went"/VBD, "to"/IN, "the"/DT, "store"/NN, "."/.]

This printout tells us for instance that “Dirk” is a proper noun (tag: NNP), and “went” is a past tense verb (tag: VBD).

Note

To better understand what each tag means, consult the tag specification of the Penn Treebank.

… in German#

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('de-pos')

# make a sentence
sentence = Sentence('Dort hatte er einen Hut gekauft.')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print:

Sentence[7]: "Dort hatte er einen Hut gekauft." → ["Dort"/ADV, "hatte"/VAFIN, "er"/PPER, "einen"/ART, "Hut"/NN, "gekauft"/VVPP, "."/$.]

… in Ukrainian#

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('pos-ukrainian')

# make a sentence
sentence = Sentence("Сьогодні в Знам’янці проживають нащадки поета — родина Шкоди.")

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

… in Arabic#

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('ar-pos')

# make a sentence
sentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية  بالقاهرة .')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

Tagging universal parts-of-speech (uPoS)#

Universal parts-of-speech are a set of minimal syntactic units that exist across languages. For instance, most languages will have VERBs or NOUNs.

We ship models trained over 14 langages to tag upos in multilingual text. Use like this:

from flair.nn import Classifier
from flair.data import Sentence

# load model
tagger = Classifier.load('pos-multi')

# text with English and German sentences
sentence = Sentence('George Washington went to Washington. Dort kaufte er einen Hut.')

# predict PoS tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print (line breaks added for readability):

Sentence: "George Washington went to Washington . Dort kaufte er einen Hut ."

→ ["George"/PROPN, "Washington"/PROPN, "went"/VERB, "to"/ADP, "Washington"/PROPN, "."/PUNCT]

→ ["Dort"/ADV, "kaufte"/VERB, "er"/PRON, "einen"/DET, "Hut"/NOUN, "."/PUNCT]

However note that they were trained for a mix of European languages and therefore will not work for other languages.

List of POS Models#

We end this section with a list of all models we currently ship with Flair.

ID	Task	Language	Training Dataset	Accuracy	Contributor / Notes
‘pos’	POS-tagging	English	Ontonotes	98.19 (Accuracy)
‘pos-fast’	POS-tagging	English	Ontonotes	98.1 (Accuracy)	(fast model)
‘upos’	POS-tagging (universal)	English	Ontonotes	98.6 (Accuracy)
‘upos-fast’	POS-tagging (universal)	English	Ontonotes	98.47 (Accuracy)	(fast model)
‘pos-multi’	POS-tagging	Multilingual	UD Treebanks	96.41 (average acc.)	(12 languages)
‘pos-multi-fast’	POS-tagging	Multilingual	UD Treebanks	92.88 (average acc.)	(12 languages)
‘ar-pos’	POS-tagging	Arabic (+dialects)	combination of corpora
‘de-pos’	POS-tagging	German	UD German - HDT	98.50 (Accuracy)
‘de-pos-tweets’	POS-tagging	German	German Tweets	93.06 (Accuracy)	stefan-it
‘da-pos’	POS-tagging	Danish	Danish Dependency Treebank		AmaliePauli
‘ml-pos’	POS-tagging	Malayalam	30000 Malayalam sentences	83	sabiqueqb
‘ml-upos’	POS-tagging	Malayalam	30000 Malayalam sentences	87	sabiqueqb
‘pt-pos-clinical’	POS-tagging	Portuguese	PUCPR	92.39	LucasFerroHAILab for clinical texts
‘pos-ukrainian’	POS-tagging	Ukrainian	Ukrainian UD	97.93 (F1)	dchaplinsky

You choose which pre-trained model you load by passing the appropriate string to the Classifier.load() method.

A full list of our current and community-contributed models can be browsed on the model hub.

Table of Contents