flair.embeddings.document#
- class flair.embeddings.document.TransformerDocumentEmbeddings(model='bert-base-uncased', layers='-1', layer_mean=False, is_token_embedding=False, **kwargs)View on GitHub#
Bases:
DocumentEmbeddings
,TransformerEmbeddings
- onnx_clsView on GitHub#
alias of
TransformerOnnxDocumentEmbeddings
- __init__(model='bert-base-uncased', layers='-1', layer_mean=False, is_token_embedding=False, **kwargs)View on GitHub#
Bidirectional transformer embeddings of words from various transformer architectures.
- Parameters:
model (
str
) – name of transformer model (see https://huggingface.co/transformers/pretrained_models.html for options)layers (
str
) – string indicating which layers to take for embedding (-1 is topmost layer)cls_pooling – Pooling strategy for combining token level embeddings. options are ‘cls’, ‘max’, ‘mean’.
layer_mean (
bool
) – If True, uses a scalar mix of layers as embeddingfine_tune – If True, allows transformers to be fine-tuned during training
is_token_embedding (
bool
) – If True, the embedding can be used as TokenEmbedding too.**kwargs – Arguments propagated to
flair.embeddings.transformer.TransformerEmbeddings.__init__()
- classmethod create_from_state(**state)View on GitHub#
- embeddings_name: str = 'TransformerDocumentEmbeddings'#
- class flair.embeddings.document.DocumentPoolEmbeddings(embeddings, fine_tune_mode='none', pooling='mean')View on GitHub#
Bases:
DocumentEmbeddings
- __init__(embeddings, fine_tune_mode='none', pooling='mean')View on GitHub#
The constructor takes a list of embeddings to be combined.
- Parameters:
embeddings (
Union
[TokenEmbeddings
,List
[TokenEmbeddings
]]) – a list of token embeddingsfine_tune_mode (
str
) – if set to “linear” a trainable layer is added, if set to “nonlinear”, a nonlinearity is added as well. Set this to make the pooling trainable.pooling (
str
) – a string which can any value from [‘mean’, ‘max’, ‘min’]
- property embedding_length: int#
Returns the length of the embedding vector.
- embed(sentences)View on GitHub#
Add embeddings to every sentence in the given list of sentences.
If embeddings are already added, updates only if embeddings are non-static.
- extra_repr()View on GitHub#
Set the extra representation of the module.
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- classmethod from_params(params)View on GitHub#
- Return type:
- to_params()View on GitHub#
- Return type:
Dict
[str
,Any
]
- embeddings_name: str = 'DocumentPoolEmbeddings'#
- class flair.embeddings.document.DocumentTFIDFEmbeddings(train_dataset, vectorizer=None, **vectorizer_params)View on GitHub#
Bases:
DocumentEmbeddings
- __init__(train_dataset, vectorizer=None, **vectorizer_params)View on GitHub#
The constructor for DocumentTFIDFEmbeddings.
- Parameters:
train_dataset (
List
[Sentence
]) – the train dataset which will be used to construct a vectorizervectorizer (
Optional
[TfidfVectorizer
]) – a precalculated vectorizer. If provided, requires the train_dataset to be an empty list.vectorizer_params – parameters given to Scikit-learn’s TfidfVectorizer constructor
- property embedding_length: int#
Returns the length of the embedding vector.
- embed(sentences)View on GitHub#
Add embeddings to every sentence in the given list of sentences.
- classmethod from_params(params)View on GitHub#
- Return type:
- to_params()View on GitHub#
- Return type:
Dict
[str
,Any
]
- embeddings_name: str = 'DocumentTFIDFEmbeddings'#
- class flair.embeddings.document.DocumentRNNEmbeddings(embeddings, hidden_size=128, rnn_layers=1, reproject_words=True, reproject_words_dimension=None, bidirectional=False, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, rnn_type='GRU', fine_tune=True)View on GitHub#
Bases:
DocumentEmbeddings
- __init__(embeddings, hidden_size=128, rnn_layers=1, reproject_words=True, reproject_words_dimension=None, bidirectional=False, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, rnn_type='GRU', fine_tune=True)View on GitHub#
Instantiates an RNN that works upon some token embeddings.
- Parameters:
embeddings (
List
[TokenEmbeddings
]) – a list of token embeddingshidden_size – the number of hidden states in the rnn
rnn_layers – the number of layers for the rnn
reproject_words (
bool
) – boolean value, indicating whether to reproject the token embeddings in a separate linear layer before putting them into the rnn or notreproject_words_dimension (
Optional
[int
]) – output dimension of reprojecting token embeddings. If None the same output dimension as before will be taken.bidirectional (
bool
) – boolean value, indicating whether to use a bidirectional rnn or notdropout (
float
) – the dropout value to be usedword_dropout (
float
) – the word dropout value to be used, if 0.0 word dropout is not usedlocked_dropout (
float
) – the locked dropout value to be used, if 0.0 locked dropout is not usedrnn_type (
str
) – ‘GRU’ or ‘LSTM’fine_tune (
bool
) – if True, allow to finetune the embeddings.
- property embedding_length: int#
Returns the length of the embedding vector.
- _add_embeddings_internal(sentences)View on GitHub#
Add embeddings to all sentences in the given list of sentences.
If embeddings are already added, update only if embeddings are non-static.
- to_params()View on GitHub#
- classmethod from_params(params)View on GitHub#
- Return type:
- embeddings_name: str = 'DocumentRNNEmbeddings'#
- class flair.embeddings.document.DocumentLMEmbeddings(flair_embeddings)View on GitHub#
Bases:
DocumentEmbeddings
- property embedding_length: int#
Returns the length of the embedding vector.
- get_names()View on GitHub#
Returns a list of embedding names.
In most cases, it is just a list with one item, namely the name of this embedding. But in some cases, the embedding is made up by different embeddings (StackedEmbedding). Then, the list contains the names of all embeddings in the stack.
- Return type:
List
[str
]
- to_params()View on GitHub#
- Return type:
Dict
[str
,Any
]
- classmethod from_params(params)View on GitHub#
- Return type:
- embeddings_name: str = 'DocumentLMEmbeddings'#
- class flair.embeddings.document.SentenceTransformerDocumentEmbeddings(model='bert-base-nli-mean-tokens', batch_size=1)View on GitHub#
Bases:
DocumentEmbeddings
- __init__(model='bert-base-nli-mean-tokens', batch_size=1)View on GitHub#
Instantiates a document embedding using the SentenceTransformer Embeddings.
- Parameters:
model (
str
) – string name of models from SentencesTransformer Classbatch_size (
int
) – int number of sentences to processed in one batch
- property embedding_length: int#
Returns the length of the embedding vector.
- classmethod from_params(params)View on GitHub#
- Return type:
- to_params()View on GitHub#
- Return type:
Dict
[str
,Any
]
- embeddings_name: str = 'SentenceTransformerDocumentEmbeddings'#
- class flair.embeddings.document.DocumentCNNEmbeddings(embeddings, kernels=((100, 3), (100, 4), (100, 5)), reproject_words=True, reproject_words_dimension=None, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, fine_tune=True)View on GitHub#
Bases:
DocumentEmbeddings
- __init__(embeddings, kernels=((100, 3), (100, 4), (100, 5)), reproject_words=True, reproject_words_dimension=None, dropout=0.5, word_dropout=0.0, locked_dropout=0.0, fine_tune=True)View on GitHub#
Instantiates a CNN that works upon some token embeddings.
- Parameters:
embeddings (
List
[TokenEmbeddings
]) – a list of token embeddingskernels – list of (number of kernels, kernel size)
reproject_words (
bool
) – boolean value, indicating whether to reproject the token embeddings in a separate linear layer before putting them into the rnn or notreproject_words_dimension (
Optional
[int
]) – output dimension of reprojecting token embeddings. If None the same output dimension as before will be taken.dropout (
float
) – the dropout value to be usedword_dropout (
float
) – the word dropout value to be used, if 0.0 word dropout is not usedlocked_dropout (
float
) – the locked dropout value to be used, if 0.0 locked dropout is not usedfine_tune (
bool
) – if True, allow to finetune the embeddings.
- embeddings_name: str = 'DocumentCNNEmbeddings'#
- property embedding_length: int#
Returns the length of the embedding vector.
- _add_embeddings_internal(sentences)View on GitHub#
Add embeddings to all sentences in the given list of sentences.
If embeddings are already added, update only if embeddings are non-static.
- classmethod from_params(params)View on GitHub#
- Return type:
- to_params()View on GitHub#
- Return type:
Dict
[str
,Any
]