Tagging parts-of-speech#
This tutorials shows you how to do part-of-speech tagging in Flair, showcases univeral and language-specific models, and gives a list of all PoS models in Flair.
Language-specific parts-of-speech (PoS)#
Syntax is fundamentally language-specific, so each language has different fine-grained parts-of-speech. Flair offers models for many languages:
… in English#
For English, we offer several models trained over Ontonotes.
Use like this:
from flair.nn import Classifier
from flair.data import Sentence
# load the model
tagger = Classifier.load('pos')
# make a sentence
sentence = Sentence('Dirk went to the store.')
# predict NER tags
tagger.predict(sentence)
# print sentence with predicted tags
print(sentence)
This should print:
Sentence[6]: "Dirk went to the store." → ["Dirk"/NNP, "went"/VBD, "to"/IN, "the"/DT, "store"/NN, "."/.]
This printout tells us for instance that “Dirk” is a proper noun (tag: NNP), and “went” is a past tense verb (tag: VBD).
Note
To better understand what each tag means, consult the tag specification of the Penn Treebank.
… in German#
from flair.nn import Classifier
from flair.data import Sentence
# load the model
tagger = Classifier.load('de-pos')
# make a sentence
sentence = Sentence('Dort hatte er einen Hut gekauft.')
# predict NER tags
tagger.predict(sentence)
# print sentence with predicted tags
print(sentence)
This should print:
Sentence[7]: "Dort hatte er einen Hut gekauft." → ["Dort"/ADV, "hatte"/VAFIN, "er"/PPER, "einen"/ART, "Hut"/NN, "gekauft"/VVPP, "."/$.]
… in Ukrainian#
from flair.nn import Classifier
from flair.data import Sentence
# load the model
tagger = Classifier.load('pos-ukrainian')
# make a sentence
sentence = Sentence("Сьогодні в Знам’янці проживають нащадки поета — родина Шкоди.")
# predict NER tags
tagger.predict(sentence)
# print sentence with predicted tags
print(sentence)
… in Arabic#
from flair.nn import Classifier
from flair.data import Sentence
# load the model
tagger = Classifier.load('ar-pos')
# make a sentence
sentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .')
# predict NER tags
tagger.predict(sentence)
# print sentence with predicted tags
print(sentence)
Tagging parts-of-speech in any language#
Universal parts-of-speech are a set of minimal syntactic units that exist across languages. For instance, most languages will have VERBs or NOUNs.
We ship models trained over 14 langages to tag upos in multilingual text. Use like this:
from flair.nn import Classifier
from flair.data import Sentence
# load model
tagger = Classifier.load('pos-multi')
# text with English and German sentences
sentence = Sentence('George Washington went to Washington. Dort kaufte er einen Hut.')
# predict PoS tags
tagger.predict(sentence)
# print sentence with predicted tags
print(sentence)
This should print (line breaks added for readability):
Sentence: "George Washington went to Washington . Dort kaufte er einen Hut ."
→ ["George"/PROPN, "Washington"/PROPN, "went"/VERB, "to"/ADP, "Washington"/PROPN, "."/PUNCT]
→ ["Dort"/ADV, "kaufte"/VERB, "er"/PRON, "einen"/DET, "Hut"/NOUN, "."/PUNCT]
However note that they were trained for a mix of European languages and therefore will not work for other languages.
List of POS Models#
We end this section with a list of all models we currently ship with Flair.
ID |
Task |
Language |
Training Dataset |
Accuracy |
Contributor / Notes |
---|---|---|---|---|---|
‘pos’ |
POS-tagging |
English |
Ontonotes |
98.19 (Accuracy) |
|
‘pos-fast’ |
POS-tagging |
English |
Ontonotes |
98.1 (Accuracy) |
(fast model) |
‘upos’ |
POS-tagging (universal) |
English |
Ontonotes |
98.6 (Accuracy) |
|
POS-tagging (universal) |
English |
Ontonotes |
98.47 (Accuracy) |
(fast model) |
|
POS-tagging |
Multilingual |
UD Treebanks |
96.41 (average acc.) |
(12 languages) |
|
POS-tagging |
Multilingual |
UD Treebanks |
92.88 (average acc.) |
(12 languages) |
|
‘ar-pos’ |
POS-tagging |
Arabic (+dialects) |
combination of corpora |
||
‘de-pos’ |
POS-tagging |
German |
UD German - HDT |
98.50 (Accuracy) |
|
‘de-pos-tweets’ |
POS-tagging |
German |
German Tweets |
93.06 (Accuracy) |
|
‘da-pos’ |
POS-tagging |
Danish |
|||
‘ml-pos’ |
POS-tagging |
Malayalam |
30000 Malayalam sentences |
83 |
|
‘ml-upos’ |
POS-tagging |
Malayalam |
30000 Malayalam sentences |
87 |
|
‘pt-pos-clinical’ |
POS-tagging |
Portuguese |
92.39 |
LucasFerroHAILab for clinical texts |
|
POS-tagging |
Ukrainian |
97.93 (F1) |
You choose which pre-trained model you load by passing the appropriate string to the Classifier.load()
method.
A full list of our current and community-contributed models can be browsed on the model hub.