Skip to main content

Flair embeddings

Contextual string embeddings are powerful embeddings that capture latent syntactic-semantic information that goes beyond standard word embeddings. Key differences are: (1) they are trained without any explicit notion of words and thus fundamentally model words as sequences of characters. And (2) they are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use.

With Flair, you can use these embeddings simply by instantiating the appropriate embedding class, same as standard word embeddings:

from flair.embeddings import FlairEmbeddings

# init embedding
flair_embedding_forward = FlairEmbeddings('news-forward')

# create a sentence
sentence = Sentence('The grass is green .')

# embed words in sentence
flair_embedding_forward.embed(sentence)

You choose which embeddings you load by passing the appropriate string to the constructor of the FlairEmbeddings class. Currently, the following contextual string embeddings are provided (note: replace 'X' with either 'forward' or 'backward'):

IDLanguageEmbedding
'multi-X'300+JW300 corpus, as proposed by Agić and Vulić (2019). The corpus is licensed under CC-BY-NC-SA
'multi-X-fast'English, German, French, Italian, Dutch, PolishMix of corpora (Web, Wikipedia, Subtitles, News), CPU-friendly
'news-X'EnglishTrained with 1 billion word corpus
'news-X-fast'EnglishTrained with 1 billion word corpus, CPU-friendly
'mix-X'EnglishTrained with mixed corpus (Web, Wikipedia, Subtitles)
'ar-X'ArabicAdded by @stefan-it: Trained with Wikipedia/OPUS
'bg-X'BulgarianAdded by @stefan-it: Trained with Wikipedia/OPUS
'bg-X-fast'BulgarianAdded by @stefan-it: Trained with various sources (Europarl, Wikipedia or SETimes)
'cs-X'CzechAdded by @stefan-it: Trained with Wikipedia/OPUS
'cs-v0-X'CzechAdded by @stefan-it: LM embeddings (earlier version)
'de-X'GermanTrained with mixed corpus (Web, Wikipedia, Subtitles)
'de-historic-ha-X'German (historical)Added by @stefan-it: Historical German trained over Hamburger Anzeiger
'de-historic-wz-X'German (historical)Added by @stefan-it: Historical German trained over Wiener Zeitung
'de-historic-rw-X'German (historical)Added by @redewiedergabe: Historical German trained over 100 million tokens
'es-X'SpanishAdded by @iamyihwa: Trained with Wikipedia
'es-X-fast'SpanishAdded by @iamyihwa: Trained with Wikipedia, CPU-friendly
'es-clinical-'Spanish (clinical)Added by @matirojasg: Trained with Wikipedia
'eu-X'BasqueAdded by @stefan-it: Trained with Wikipedia/OPUS
'eu-v0-X'BasqueAdded by @stefan-it: LM embeddings (earlier version)
'fa-X'PersianAdded by @stefan-it: Trained with Wikipedia/OPUS
'fi-X'FinnishAdded by @stefan-it: Trained with Wikipedia/OPUS
'fr-X'FrenchAdded by @mhham: Trained with French Wikipedia
'he-X'HebrewAdded by @stefan-it: Trained with Wikipedia/OPUS
'hi-X'HindiAdded by @stefan-it: Trained with Wikipedia/OPUS
'hr-X'CroatianAdded by @stefan-it: Trained with Wikipedia/OPUS
'id-X'IndonesianAdded by @stefan-it: Trained with Wikipedia/OPUS
'it-X'ItalianAdded by @stefan-it: Trained with Wikipedia/OPUS
'ja-X'JapaneseAdded by @frtacoa: Trained with 439M words of Japanese Web crawls (2048 hidden states, 2 layers)
'nl-X'DutchAdded by @stefan-it: Trained with Wikipedia/OPUS
'nl-v0-X'DutchAdded by @stefan-it: LM embeddings (earlier version)
'no-X'NorwegianAdded by @stefan-it: Trained with Wikipedia/OPUS
'pl-X'PolishAdded by @borchmann: Trained with web crawls (Polish part of CommonCrawl)
'pl-opus-X'PolishAdded by @stefan-it: Trained with Wikipedia/OPUS
'pt-X'PortugueseAdded by @ericlief: LM embeddings
'sl-X'SlovenianAdded by @stefan-it: Trained with Wikipedia/OPUS
'sl-v0-X'SlovenianAdded by @stefan-it: Trained with various sources (Europarl, Wikipedia and OpenSubtitles2018)
'sv-X'SwedishAdded by @stefan-it: Trained with Wikipedia/OPUS
'sv-v0-X'SwedishAdded by @stefan-it: Trained with various sources (Europarl, Wikipedia or OpenSubtitles2018)
'ta-X'TamilAdded by @stefan-it
'pubmed-X'EnglishAdded by @jessepeng: Trained with 5% of PubMed abstracts until 2015 (1150 hidden states, 3 layers)
'de-impresso-hipe-v1-X'German (historical)In-domain data (Swiss and Luxembourgish newspapers) for CLEF HIPE Shared task. More information on the shared task can be found in this paper
'en-impresso-hipe-v1-X'English (historical)In-domain data (Chronicling America material) for CLEF HIPE Shared task. More information on the shared task can be found in this paper
'fr-impresso-hipe-v1-X'French (historical)In-domain data (Swiss and Luxembourgish newspapers) for CLEF HIPE Shared task. More information on the shared task can be found in this paper
'am-X'AmharicBased on 6.5m Amharic text corpus crawled from different sources. See this paper and the official GitHub Repository for more information.
'uk-X'UkrainianAdded by @dchaplinsky: Trained with UberText corpus.

So, if you want to load embeddings from the German forward LM model, instantiate the method as follows:

flair_de_forward = FlairEmbeddings('de-forward')

And if you want to load embeddings from the Bulgarian backward LM model, instantiate the method as follows:

flair_bg_backward = FlairEmbeddings('bg-backward')

We recommend combining both forward and backward Flair embeddings. Depending on the task, we also recommend adding standard word embeddings into the mix. So, our recommended StackedEmbedding for most English tasks is:

from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings

# create a StackedEmbedding object that combines glove and forward/backward flair embeddings
stacked_embeddings = StackedEmbeddings([
WordEmbeddings('glove'),
FlairEmbeddings('news-forward'),
FlairEmbeddings('news-backward'),
])

That's it! Now just use this embedding like all the other embeddings, i.e. call the embed() method over your sentences.

sentence = Sentence('The grass is green .')

# just embed a sentence using the StackedEmbedding as you would with any single embedding.
stacked_embeddings.embed(sentence)

# now check out the embedded tokens.
for token in sentence:
print(token)
print(token.embedding)

Words are now embedded using a concatenation of three different embeddings. This combination often gives state-of-the-art accuracy.

Pooled Flair embeddings

We also developed a pooled variant of the FlairEmbeddings. These embeddings differ in that they constantly evolve over time, even at prediction time (i.e. after training is complete). This means that the same words in the same sentence at two different points in time may have different embeddings.

PooledFlairEmbeddings manage a 'global' representation of each distinct word by using a pooling operation of all past occurences. More details on how this works may be found in Akbik et al. (2019).

You can instantiate and use PooledFlairEmbeddings like any other embedding:

from flair.embeddings import PooledFlairEmbeddings

# init embedding
flair_embedding_forward = PooledFlairEmbeddings('news-forward')

# create a sentence
sentence = Sentence('The grass is green .')

# embed words in sentence
flair_embedding_forward.embed(sentence)

Note that while we get some of our best results with PooledFlairEmbeddings they are very ineffective memory-wise since they keep past embeddings of all words in memory. In many cases, regular FlairEmbeddings will be nearly as good but with much lower memory requirements.