flair.data.Sentence#
- class flair.data.Sentence(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub#
Bases:
DataPoint
A Sentence is a central object in Flair that represents either a single sentence or a whole text.
Internally, it consists of a list of Token objects that represent each word in the text. Additionally, this object stores all metadata related to a text such as labels, language code, etc.
- __init__(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub#
Create a sentence object by passing either a text or a list of tokens.
- Parameters:
text (
Union
[str
,list
[str
],list
[Token
]]) – Either pass the text as a string, or provide an already tokenized text as either a list of strings or a list ofToken
objects.use_tokenizer (
Union
[bool
,Tokenizer
]) – You can optionally specify a custom tokenizer to split the text into tokens. By default we useflair.tokenization.SegtokTokenizer
. If use_tokenizer is set to False,flair.tokenization.SpaceTokenizer
will be used instead. The tokenizer will be ignored, if text refers to pretokenized tokens.language_code (
Optional
[str
]) – Language of the sentence. If not provided, langdetect will be called when the language_code is accessed for the first time.start_position (
int
) – Start char offset of the sentence in the superordinate document.
Methods
__init__
(text[, use_tokenizer, ...])Create a sentence object by passing either a text or a list of tokens.
add_label
(typename, value[, score])Adds a label to the
DataPoint
by internally creating aLabel
object.add_metadata
(key, value)clear_embeddings
([embedding_names])copy_context_from_sentence
(sentence)get_each_embedding
([embedding_names])get_embedding
([names])get_label
([label_type, zero_tag_value])get_labels
([label_type])Returns all labels of this datapoint belonging to a specific annotation layer.
get_metadata
(key)get_relations
([label_type])get_span
(start, stop)get_spans
([label_type])get_token
(token_id)has_label
(type)has_metadata
(key)Heuristics in case you wish to infer whitespace_after values for tokenized text.
Determines if this sentence has a context of sentences before or after set.
left_context
(context_length[, ...])Get the next sentence in the document.
Get the previous sentence in the document.
remove_labels
(typename)right_context
(context_length[, ...])set_context_for_sentences
(sentences)set_embedding
(name, vector)set_label
(typename, value[, score])to
(device[, pin_memory])to_dict
([tag_type])to_tagged_string
([main_label])Attributes
labels
score
tag
- property unlabeled_identifier#
- get_relations(label_type=None)View on GitHub#
- Return type:
list
[Relation
]
- get_spans(label_type=None)View on GitHub#
- Return type:
list
[Span
]
- get_token(token_id)View on GitHub#
- Return type:
Optional
[Token
]
- property embedding#
- to(device, pin_memory=False)View on GitHub#
- clear_embeddings(embedding_names=None)View on GitHub#
- left_context(context_length, respect_document_boundaries=True)View on GitHub#
- Return type:
list
[Token
]
- right_context(context_length, respect_document_boundaries=True)View on GitHub#
- Return type:
list
[Token
]
- to_tagged_string(main_label=None)View on GitHub#
- Return type:
str
- property text: str#
- to_tokenized_string()View on GitHub#
- Return type:
str
- to_plain_string()View on GitHub#
- Return type:
str
- infer_space_after()View on GitHub#
Heuristics in case you wish to infer whitespace_after values for tokenized text.
This is useful for some old NLP tasks (such as CoNLL-03 and CoNLL-2000) that provide only tokenized data with no info of original whitespacing. :return:
- to_original_text()View on GitHub#
- Return type:
str
- to_dict(tag_type=None)View on GitHub#
- Return type:
dict
[str
,Any
]
- get_span(start, stop)View on GitHub#
- Return type:
- property start_position: int#
- property end_position: int#
- get_language_code()View on GitHub#
- Return type:
str
- next_sentence()View on GitHub#
Get the next sentence in the document.
This only works if context is set through dataloader or elsewhere :return: next Sentence in document if set, otherwise None
- previous_sentence()View on GitHub#
Get the previous sentence in the document.
works only if context is set through dataloader or elsewhere :return: previous Sentence in document if set, otherwise None
- is_context_set()View on GitHub#
Determines if this sentence has a context of sentences before or after set.
Return True or False depending on whether context is set (for instance in dataloader or elsewhere) :rtype:
bool
:return: True if context is set, else False
- copy_context_from_sentence(sentence)View on GitHub#
- Return type:
None
- classmethod set_context_for_sentences(sentences)View on GitHub#
- Return type:
None
- get_labels(label_type=None)View on GitHub#
Returns all labels of this datapoint belonging to a specific annotation layer.
For instance, if a data point has been labeled with “sentiment”-labels, you can call this function as get_labels(“sentiment”) to return a list of all sentiment labels.
- Parameters:
typename – The string identifier of the annotation layer, like “sentiment” or “ner”.
- Returns:
A list of
Label
objects belonging to this annotation layer for this data point.
- remove_labels(typename)View on GitHub#