flair.embeddings.legacy#

Warning

All embeddings in flair.embeddings.legacy are considered deprecated. there is no guarantee that they are still working and we recommend using different embeddings instead.

class flair.embeddings.legacy.ELMoEmbeddings(model='original', options_file=None, weight_file=None, embedding_mode='all')View on GitHub#

Bases: TokenEmbeddings

Contextual word embeddings using word-level LM, as proposed in Peters et al., 2018. ELMo word vectors can be constructed by combining layers in different ways. Default is to concatene the top 3 layers in the LM.

property embedding_length: int#

Returns the length of the embedding vector.

use_layers_all(x)View on GitHub#
use_layers_top(x)View on GitHub#
use_layers_average(x)View on GitHub#
extra_repr()View on GitHub#

Set the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

class flair.embeddings.legacy.CharLMEmbeddings(model, detach=True, use_cache=False, cache_directory=None)View on GitHub#

Bases: TokenEmbeddings

Contextual string embeddings of words, as proposed in Akbik et al., 2018.

__init__(model, detach=True, use_cache=False, cache_directory=None)View on GitHub#

Initializes contextual string embeddings using a character-level language model.

Parameters:
  • model (str) – model string, one of ‘news-forward’, ‘news-backward’, ‘news-forward-fast’, ‘news-backward-fast’, ‘mix-forward’, ‘mix-backward’, ‘german-forward’, ‘german-backward’, ‘polish-backward’, ‘polish-forward’ depending on which character language model is desired.

  • detach (bool) – if set to False, the gradient will propagate into the language model. this dramatically slows down training and often leads to worse results, so not recommended.

  • use_cache (bool) – if set to False, will not write embeddings to file for later retrieval. this saves disk space but will not allow re-use of once computed embeddings that do not fit into memory

  • cache_directory (Optional[Path]) – if cache_directory is not set, the cache will be written to ~/.flair/embeddings. otherwise the cache is written to the provided directory.

Deprecated since version 0.4: Use ‘FlairEmbeddings’ instead.

train(mode=True)View on GitHub#

Set the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

property embedding_length: int#

Returns the length of the embedding vector.

class flair.embeddings.legacy.DocumentMeanEmbeddings(token_embeddings)View on GitHub#

Bases: DocumentEmbeddings

__init__(token_embeddings)View on GitHub#

The constructor takes a list of embeddings to be combined.

Deprecated since version 0.3.1: The functionality of this class is moved to ‘DocumentPoolEmbeddings’

property embedding_length: int#

Returns the length of the embedding vector.

embed(sentences)View on GitHub#

Add embeddings to every sentence in the given list of sentences. If embeddings are already added, updates only if embeddings are non-static.

class flair.embeddings.legacy.DocumentLSTMEmbeddings(embeddings, hidden_size=128, rnn_layers=1, reproject_words=True, reproject_words_dimension=None, bidirectional=False, dropout=0.5, word_dropout=0.0, locked_dropout=0.0)View on GitHub#

Bases: DocumentEmbeddings

__init__(embeddings, hidden_size=128, rnn_layers=1, reproject_words=True, reproject_words_dimension=None, bidirectional=False, dropout=0.5, word_dropout=0.0, locked_dropout=0.0)View on GitHub#

The constructor takes a list of embeddings to be combined.

Parameters:
  • embeddings (list[TokenEmbeddings]) – a list of token embeddings

  • hidden_size – the number of hidden states in the lstm

  • rnn_layers – the number of layers for the lstm

  • reproject_words (bool) – boolean value, indicating whether to reproject the token embeddings in a separate linear layer before putting them into the lstm or not.

  • reproject_words_dimension (Optional[int]) – output dimension of reprojecting token embeddings. If None the same output dimension as before will be taken.

  • bidirectional (bool) – boolean value, indicating whether to use a bidirectional lstm or not

  • dropout (float) – the dropout value to be used

  • word_dropout (float) – the word dropout value to be used, if 0.0 word dropout is not used

  • locked_dropout (float) – the locked dropout value to be used, if 0.0 locked dropout is not used.

Deprecated since version 0.4: The functionality of this class is moved to ‘DocumentRNNEmbeddings’

property embedding_length: int#

Returns the length of the embedding vector.

embed(sentences)View on GitHub#

Add embeddings to all sentences in the given list of sentences. If embeddings are already added, update only if embeddings are non-static.