flair.data.Sentence#
- class flair.data.Sentence(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub#
Bases:
DataPointA Sentence is a central object in Flair that represents either a single sentence or a whole text.
Internally, it consists of a list of Token objects that represent each word in the text. Additionally, this object stores all metadata related to a text such as labels, language code, etc.
- __init__(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub#
Create a sentence object by passing either a text or a list of tokens.
- Parameters:
text (
Union[str,list[str],list[Token]]) – Either pass the text as a string, or provide an already tokenized text as either a list of strings or a list ofTokenobjects.use_tokenizer (
Union[bool,Tokenizer]) – You can optionally specify a custom tokenizer to split the text into tokens. By default we useflair.tokenization.SegtokTokenizer. If use_tokenizer is set to False,flair.tokenization.SpaceTokenizerwill be used instead. The tokenizer will be ignored, if text refers to pretokenized tokens.language_code (
Optional[str]) – Language of the sentence. If not provided, langdetect will be called when the language_code is accessed for the first time.start_position (
int) – Start char offset of the sentence in the superordinate document.
Methods
__init__(text[, use_tokenizer, ...])Create a sentence object by passing either a text or a list of tokens.
add_label(typename, value[, score])Adds a label to the
DataPointby internally creating aLabelobject.add_metadata(key, value)clear_embeddings([embedding_names])copy_context_from_sentence(sentence)get_each_embedding([embedding_names])get_embedding([names])get_label([label_type, zero_tag_value])get_labels([label_type])Returns all labels of this datapoint belonging to a specific annotation layer.
get_metadata(key)get_relations([label_type])get_span(start, stop)get_spans([label_type])get_token(token_id)has_label(type)has_metadata(key)Heuristics in case you wish to infer whitespace_after values for tokenized text.
Determines if this sentence has a context of sentences before or after set.
left_context(context_length[, ...])Get the next sentence in the document.
Get the previous sentence in the document.
remove_labels(typename)right_context(context_length[, ...])set_context_for_sentences(sentences)set_embedding(name, vector)set_label(typename, value[, score])to(device[, pin_memory])to_dict([tag_type])to_tagged_string([main_label])Attributes
labelsscoretag- property unlabeled_identifier#
- get_relations(label_type=None)View on GitHub#
- Return type:
list[Relation]
- get_spans(label_type=None)View on GitHub#
- Return type:
list[Span]
- get_token(token_id)View on GitHub#
- Return type:
Optional[Token]
- property embedding#
- to(device, pin_memory=False)View on GitHub#
- clear_embeddings(embedding_names=None)View on GitHub#
- left_context(context_length, respect_document_boundaries=True)View on GitHub#
- Return type:
list[Token]
- right_context(context_length, respect_document_boundaries=True)View on GitHub#
- Return type:
list[Token]
- to_tagged_string(main_label=None)View on GitHub#
- Return type:
str
- property text: str#
- to_tokenized_string()View on GitHub#
- Return type:
str
- to_plain_string()View on GitHub#
- Return type:
str
- infer_space_after()View on GitHub#
Heuristics in case you wish to infer whitespace_after values for tokenized text.
This is useful for some old NLP tasks (such as CoNLL-03 and CoNLL-2000) that provide only tokenized data with no info of original whitespacing. :return:
- to_original_text()View on GitHub#
- Return type:
str
- to_dict(tag_type=None)View on GitHub#
- Return type:
dict[str,Any]
- get_span(start, stop)View on GitHub#
- Return type:
- property start_position: int#
- property end_position: int#
- get_language_code()View on GitHub#
- Return type:
str
- next_sentence()View on GitHub#
Get the next sentence in the document.
This only works if context is set through dataloader or elsewhere :return: next Sentence in document if set, otherwise None
- previous_sentence()View on GitHub#
Get the previous sentence in the document.
works only if context is set through dataloader or elsewhere :return: previous Sentence in document if set, otherwise None
- is_context_set()View on GitHub#
Determines if this sentence has a context of sentences before or after set.
Return True or False depending on whether context is set (for instance in dataloader or elsewhere) :rtype:
bool:return: True if context is set, else False
- copy_context_from_sentence(sentence)View on GitHub#
- Return type:
None
- classmethod set_context_for_sentences(sentences)View on GitHub#
- Return type:
None
- get_labels(label_type=None)View on GitHub#
Returns all labels of this datapoint belonging to a specific annotation layer.
For instance, if a data point has been labeled with “sentiment”-labels, you can call this function as get_labels(“sentiment”) to return a list of all sentiment labels.
- Parameters:
typename – The string identifier of the annotation layer, like “sentiment” or “ner”.
- Returns:
A list of
Labelobjects belonging to this annotation layer for this data point.
- remove_labels(typename)View on GitHub#