Train a sequence tagger
Sequence labeling models are used to model problems such as named entity recognition (NER) and part-of-speech (PoS) tagging.
This tutorial section show you how to train state-of-the-art NER models and other taggers in Flair.
Training a named entity recognition (NER) model with transformers
For a state-of-the-art NER sytem you should fine-tune transformer embeddings, and use full document context (see our FLERT paper for details).
Use the following script:
from flair.datasets import CONLL_03
from flair.embeddings import TransformerWordEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
# 1. get the corpus
corpus = CONLL_03()
print(corpus)
# 2. what label do we want to predict?
label_type = 'ner'
# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type, add_unk=False)
print(label_dict)
# 4. initialize fine-tuneable transformer embeddings WITH document context
embeddings = TransformerWordEmbeddings(model='xlm-roberta-large',
layers="-1",
subtoken_pooling="first",
fine_tune=True,
use_context=True,
)
# 5. initialize bare-bones sequence tagger (no CRF, no RNN, no reprojection)
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type='ner',
use_crf=False,
use_rnn=False,
reproject_embeddings=False,
)
# 6. initialize trainer
trainer = ModelTrainer(tagger, corpus)
# 7. run fine-tuning
trainer.fine_tune('resources/taggers/sota-ner-flert',
learning_rate=5.0e-6,
mini_batch_size=4,
mini_batch_chunk_size=1, # remove this parameter to speed up computation if you have a big GPU
)
As you can see, we use 'xlm-roberta-large' embeddings, enable fine-tuning and set use_context
to True.
We also deactivate the RNN, CRF and reprojection in the SequenceTagger
. This is because the
transformer is so powerful that it does not need these components. We then fine-tune the model with a very small
learning rate on the corpus.
This will give you state-of-the-art numbers similar to the ones reported in Schweter and Akbik (2021).
Training a named entity recognition (NER) model with Flair embeddings
Alternatively to fine-tuning a very large transformer, you can use a classic training setup without fine-tuning. In the classic setup, you learn a LSTM-CRF on top of frozen embeddings. We typically use a 'stack' that combines Flair and GloVe embeddings:
from flair.datasets import CONLL_03
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
# 1. get the corpus
corpus = CONLL_03()
print(corpus)
# 2. what label do we want to predict?
label_type = 'ner'
# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type, add_unk=False)
print(label_dict)
# 4. initialize embedding stack with Flair and GloVe
embedding_types = [
WordEmbeddings('glove'),
FlairEmbeddings('news-forward'),
FlairEmbeddings('news-backward'),
]
embeddings = StackedEmbeddings(embeddings=embedding_types)
# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type=label_type)
# 6. initialize trainer
trainer = ModelTrainer(tagger, corpus)
# 7. start training
trainer.train('resources/taggers/sota-ner-flair',
learning_rate=0.1,
mini_batch_size=32,
max_epochs=150)
This will give you state-of-the-art numbers similar to the ones reported in Akbik et al. (2018). The numbers are not quite as high as fine-tuning transformers, but it requires less GPU memory and depending on your setup may run faster in the end.
Training a part-of-speech tagger
If you want to train a part-of-speech model instead of NER, simply exchange the corpus and the label type:
from flair.datasets import UD_ENGLISH
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
# 1. get the corpus
corpus = UD_ENGLISH()
print(corpus)
# 2. what label do we want to predict?
label_type = 'upos'
# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type)
print(label_dict)
# 4. initialize embeddings
embedding_types = [
WordEmbeddings('glove'),
FlairEmbeddings('news-forward'),
FlairEmbeddings('news-backward'),
]
embeddings = StackedEmbeddings(embeddings=embedding_types)
# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type=label_type,
use_crf=True)
# 6. initialize trainer
trainer = ModelTrainer(tagger, corpus)
# 7. start training
trainer.train('resources/taggers/example-upos',
learning_rate=0.1,
mini_batch_size=32)
This script will give you the state-of-the-art accuracy reported in Akbik et al. (2018).
Multi-dataset training
Now, let us train a single model that can PoS tag text in both English and German. To do this, we load both the English and German UD corpora and create a MultiCorpus object. We also use the new multilingual Flair embeddings for this task.
All the rest is same as before, e.g.:
from flair.data import MultiCorpus
from flair.datasets import UD_ENGLISH, UD_GERMAN
from flair.embeddings import FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
# 1. get the corpora - English and German UD
corpus = MultiCorpus([UD_ENGLISH(), UD_GERMAN()]).downsample(0.1)
# 2. what label do we want to predict?
label_type = 'upos'
# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type)
print(label_dict)
# 4. initialize embeddings
embedding_types = [
# we use multilingual Flair embeddings in this task
FlairEmbeddings('multi-forward'),
FlairEmbeddings('multi-backward'),
]
embeddings = StackedEmbeddings(embeddings=embedding_types)
# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type=label_type,
use_crf=True)
# 6. initialize trainer
trainer = ModelTrainer(tagger, corpus)
# 7. start training
trainer.train('resources/taggers/example-universal-pos',
learning_rate=0.1,
mini_batch_size=32,
max_epochs=150,
)
This gives you a multilingual model. Try experimenting with more languages!