Skip to main content

Tagging parts-of-speech

This tutorials shows you how to do part-of-speech tagging in Flair, showcases univeral and language-specific models, and gives a list of all PoS models in Flair.

Language-specific parts-of-speech (PoS)

Syntax is fundamentally language-specific, so each language has different fine-grained parts-of-speech. Flair offers models for many languages:

... in English

For English, we offer several models trained over Ontonotes.

Use like this:

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('pos')

# make a sentence
sentence = Sentence('Dirk went to the store.')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print:

Sentence[6]: "Dirk went to the store." → ["Dirk"/NNP, "went"/VBD, "to"/IN, "the"/DT, "store"/NN, "."/.]

This printout tells us for instance that "Dirk" is a proper noun (tag: NNP), and "went" is a past tense verb (tag: VBD).

info

To better understand what each tag means, consult the tag specification of the Penn Treebank.

... in German

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('de-pos')

# make a sentence
sentence = Sentence('Dort hatte er einen Hut gekauft.')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print:

Sentence[7]: "Dort hatte er einen Hut gekauft." → ["Dort"/ADV, "hatte"/VAFIN, "er"/PPER, "einen"/ART, "Hut"/NN, "gekauft"/VVPP, "."/$.]

... in Ukrainian

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('pos-ukrainian')

# make a sentence
sentence = Sentence("Сьогодні в Знам’янці проживають нащадки поета — родина Шкоди.")

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

... in Arabic

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('ar-pos')

# make a sentence
sentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

Tagging parts-of-speech in any language​

Universal parts-of-speech are a set of minimal syntactic units that exist across languages. For instance, most languages will have VERBs or NOUNs.

We ship models trained over 14 langages to tag upos in multilingual text. Use like this:

from flair.nn import Classifier
from flair.data import Sentence

# load model
tagger = Classifier.load('pos-multi')

# text with English and German sentences
sentence = Sentence('George Washington went to Washington. Dort kaufte er einen Hut.')

# predict PoS tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

This should print (line breaks added for readability):

Sentence: "George Washington went to Washington . Dort kaufte er einen Hut ."

→ ["George"/PROPN, "Washington"/PROPN, "went"/VERB, "to"/ADP, "Washington"/PROPN, "."/PUNCT]

→ ["Dort"/ADV, "kaufte"/VERB, "er"/PRON, "einen"/DET, "Hut"/NOUN, "."/PUNCT]

However note that they were trained for a mix of European languages and therefore will not work for other languages.

List of POS Models

We end this section with a list of all models we currently ship with Flair.

IDTaskLanguageTraining DatasetAccuracyContributor / Notes
'pos'POS-taggingEnglishOntonotes98.19 (Accuracy)
'pos-fast'POS-taggingEnglishOntonotes98.1 (Accuracy)(fast model)
'upos'POS-tagging (universal)EnglishOntonotes98.6 (Accuracy)
'upos-fast'POS-tagging (universal)EnglishOntonotes98.47 (Accuracy)(fast model)
'pos-multi'POS-taggingMultilingualUD Treebanks96.41 (average acc.)(12 languages)
'pos-multi-fast'POS-taggingMultilingualUD Treebanks92.88 (average acc.)(12 languages)
'ar-pos'POS-taggingArabic (+dialects)combination of corpora
'de-pos'POS-taggingGermanUD German - HDT98.50 (Accuracy)
'de-pos-tweets'POS-taggingGermanGerman Tweets93.06 (Accuracy)stefan-it
'da-pos'POS-taggingDanishDanish Dependency TreebankAmaliePauli
'ml-pos'POS-taggingMalayalam30000 Malayalam sentences83sabiqueqb
'ml-upos'POS-taggingMalayalam30000 Malayalam sentences87sabiqueqb
'pt-pos-clinical'POS-taggingPortuguesePUCPR92.39LucasFerroHAILab for clinical texts
'pos-ukrainian'POS-taggingUkrainianUkrainian UD97.93 (F1)dchaplinsky

You choose which pre-trained model you load by passing the appropriate string to the load() method of the Classifier class.

A full list of our current and community-contributed models can be browsed on the model hub.