flair.embeddings.document#

class flair.embeddings.document.TransformerDocumentEmbeddings(model='bert-base-uncased', layers='-1', layer_mean=False, is_token_embedding=False, **kwargs)View on GitHub#

Bases: DocumentEmbeddings, TransformerEmbeddings

onnx_clsView on GitHub#

alias of TransformerOnnxDocumentEmbeddings

__init__(model='bert-base-uncased', layers='-1', layer_mean=False, is_token_embedding=False, **kwargs)View on GitHub#

Bidirectional transformer embeddings of words from various transformer architectures.

Parameters:
  • model (str) – name of transformer model (see https://huggingface.co/transformers/pretrained_models.html for options)

  • layers (str) – string indicating which layers to take for embedding (-1 is topmost layer)

  • cls_pooling – Pooling strategy for combining token level embeddings. options are ‘cls’, ‘max’, ‘mean’.

  • layer_mean (bool) – If True, uses a scalar mix of layers as embedding

  • fine_tune – If True, allows transformers to be fine-tuned during training

  • is_token_embedding (bool) – If True, the embedding can be used as TokenEmbedding too.

  • **kwargs – Arguments propagated to flair.embeddings.transformer.TransformerEmbeddings.__init__()

classmethod create_from_state(**state)View on GitHub#
embeddings_name: str = 'TransformerDocumentEmbeddings'#
class flair.embeddings.document.DocumentPoolEmbeddings(embeddings, fine_tune_mode='none', pooling='mean')View on GitHub#

Bases: DocumentEmbeddings

__init__(embeddings, fine_tune_mode='none', pooling='mean')View on GitHub#

The constructor takes a list of embeddings to be combined.

Parameters:
  • embeddings (Union[TokenEmbeddings, List[TokenEmbeddings]]) – a list of token embeddings

  • fine_tune_mode (str) – if set to “linear” a trainable layer is added, if set to “nonlinear”, a nonlinearity is added as well. Set this to make the pooling trainable.

  • pooling (str) – a string which can any value from [‘mean’, ‘max’, ‘min’]

property embedding_length: int#

Returns the length of the embedding vector.

embed(sentences)View on GitHub#

Add embeddings to every sentence in the given list of sentences.

If embeddings are already added, updates only if embeddings are non-static.

extra_repr()View on GitHub#

Set the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

classmethod from_params(params)View on GitHub#
Return type:

DocumentPoolEmbeddings

to_params()View on GitHub#
Return type:

Dict[str, Any]

embeddings_name: str = 'DocumentPoolEmbeddings'#
class flair.embeddings.document.DocumentTFIDFEmbeddings(train_dataset, vectorizer=None, **vectorizer_params)View on GitHub#

Bases: DocumentEmbeddings

__init__(train_dataset, vectorizer=None, **vectorizer_params)View on GitHub#

The constructor for DocumentTFIDFEmbeddings.

Parameters:
  • train_dataset (List[Sentence]) – the train dataset which will be used to construct a vectorizer

  • vectorizer (Optional[TfidfVectorizer]) – a precalculated vectorizer. If provided, requires the train_dataset to be an empty list.

  • vectorizer_params – parameters given to Scikit-learn’s TfidfVectorizer constructor

property embedding_length: int#

Returns the length of the embedding vector.

embed(sentences)View on GitHub#

Add embeddings to every sentence in the given list of sentences.

classmethod from_params(params)View on GitHub#
Return type:

DocumentTFIDFEmbeddings

to_params()View on GitHub#
Return type:

Dict[str, Any]

embeddings_name: str = 'DocumentTFIDFEmbeddings'#
class flair.embeddings.document.DocumentRNNEmbeddings(embeddings, hidden_size=128, rnn_layers=1, reproject_words=True, reproject_words_dimension=None, bidirectional=False, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, rnn_type='GRU', fine_tune=True)View on GitHub#

Bases: DocumentEmbeddings

__init__(embeddings, hidden_size=128, rnn_layers=1, reproject_words=True, reproject_words_dimension=None, bidirectional=False, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, rnn_type='GRU', fine_tune=True)View on GitHub#

Instantiates an RNN that works upon some token embeddings.

Parameters:
  • embeddings (List[TokenEmbeddings]) – a list of token embeddings

  • hidden_size – the number of hidden states in the rnn

  • rnn_layers – the number of layers for the rnn

  • reproject_words (bool) – boolean value, indicating whether to reproject the token embeddings in a separate linear layer before putting them into the rnn or not

  • reproject_words_dimension (Optional[int]) – output dimension of reprojecting token embeddings. If None the same output dimension as before will be taken.

  • bidirectional (bool) – boolean value, indicating whether to use a bidirectional rnn or not

  • dropout (float) – the dropout value to be used

  • word_dropout (float) – the word dropout value to be used, if 0.0 word dropout is not used

  • locked_dropout (float) – the locked dropout value to be used, if 0.0 locked dropout is not used

  • rnn_type (str) – ‘GRU’ or ‘LSTM’

  • fine_tune (bool) – if True, allow to finetune the embeddings.

property embedding_length: int#

Returns the length of the embedding vector.

_add_embeddings_internal(sentences)View on GitHub#

Add embeddings to all sentences in the given list of sentences.

If embeddings are already added, update only if embeddings are non-static.

to_params()View on GitHub#
classmethod from_params(params)View on GitHub#
Return type:

DocumentRNNEmbeddings

embeddings_name: str = 'DocumentRNNEmbeddings'#
class flair.embeddings.document.DocumentLMEmbeddings(flair_embeddings)View on GitHub#

Bases: DocumentEmbeddings

property embedding_length: int#

Returns the length of the embedding vector.

get_names()View on GitHub#

Returns a list of embedding names.

In most cases, it is just a list with one item, namely the name of this embedding. But in some cases, the embedding is made up by different embeddings (StackedEmbedding). Then, the list contains the names of all embeddings in the stack.

Return type:

List[str]

to_params()View on GitHub#
Return type:

Dict[str, Any]

classmethod from_params(params)View on GitHub#
Return type:

DocumentLMEmbeddings

embeddings_name: str = 'DocumentLMEmbeddings'#
class flair.embeddings.document.SentenceTransformerDocumentEmbeddings(model='bert-base-nli-mean-tokens', batch_size=1)View on GitHub#

Bases: DocumentEmbeddings

__init__(model='bert-base-nli-mean-tokens', batch_size=1)View on GitHub#

Instantiates a document embedding using the SentenceTransformer Embeddings.

Parameters:
  • model (str) – string name of models from SentencesTransformer Class

  • batch_size (int) – int number of sentences to processed in one batch

property embedding_length: int#

Returns the length of the embedding vector.

classmethod from_params(params)View on GitHub#
Return type:

SentenceTransformerDocumentEmbeddings

to_params()View on GitHub#
Return type:

Dict[str, Any]

embeddings_name: str = 'SentenceTransformerDocumentEmbeddings'#
class flair.embeddings.document.DocumentCNNEmbeddings(embeddings, kernels=((100, 3), (100, 4), (100, 5)), reproject_words=True, reproject_words_dimension=None, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, fine_tune=True)View on GitHub#

Bases: DocumentEmbeddings

__init__(embeddings, kernels=((100, 3), (100, 4), (100, 5)), reproject_words=True, reproject_words_dimension=None, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, fine_tune=True)View on GitHub#

Instantiates a CNN that works upon some token embeddings.

Parameters:
  • embeddings (List[TokenEmbeddings]) – a list of token embeddings

  • kernels – list of (number of kernels, kernel size)

  • reproject_words (bool) – boolean value, indicating whether to reproject the token embeddings in a separate linear layer before putting them into the rnn or not

  • reproject_words_dimension (Optional[int]) – output dimension of reprojecting token embeddings. If None the same output dimension as before will be taken.

  • dropout (float) – the dropout value to be used

  • word_dropout (float) – the word dropout value to be used, if 0.0 word dropout is not used

  • locked_dropout (float) – the locked dropout value to be used, if 0.0 locked dropout is not used

  • fine_tune (bool) – if True, allow to finetune the embeddings.

embeddings_name: str = 'DocumentCNNEmbeddings'#
property embedding_length: int#

Returns the length of the embedding vector.

_add_embeddings_internal(sentences)View on GitHub#

Add embeddings to all sentences in the given list of sentences.

If embeddings are already added, update only if embeddings are non-static.

classmethod from_params(params)View on GitHub#
Return type:

DocumentCNNEmbeddings

to_params()View on GitHub#
Return type:

Dict[str, Any]