flair.data.Sentence#

class flair.data.Sentence(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub #

Bases: DataPoint

A Sentence is a central object in Flair that represents either a single sentence or a whole text.

Internally, it consists of a list of Token objects that represent each word in the text. Additionally, this object stores all metadata related to a text such as labels, language code, etc.

__init__(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub #

Create a sentence object by passing either a text or a list of tokens.

Parameters:

text (Union[str, list[str], list[Token]]) – Either pass the text as a string, or provide an already tokenized text as either a list of strings or a list of Token objects.
use_tokenizer (Union[bool, Tokenizer]) – You can optionally specify a custom tokenizer to split the text into tokens. By default we use flair.tokenization.SegtokTokenizer. If use_tokenizer is set to False, flair.tokenization.SpaceTokenizer will be used instead. The tokenizer will be ignored, if text refers to pretokenized tokens.
language_code (Optional[str]) – Language of the sentence. If not provided, langdetect will be called when the language_code is accessed for the first time.
start_position (int) – Start char offset of the sentence in the superordinate document.

Methods

`__init__`(text[, use_tokenizer, ...])	Create a sentence object by passing either a text or a list of tokens.
`add_label`(typename, value[, score])	Adds a label to the `DataPoint` by internally creating a `Label` object.
`add_metadata`(key, value)
`clear_embeddings`([embedding_names])
`copy_context_from_sentence`(sentence)
`get_each_embedding`([embedding_names])
`get_embedding`([names])
`get_label`([label_type, zero_tag_value])
`get_labels`([label_type])	Returns all labels of this datapoint belonging to a specific annotation layer.
`get_language_code`()
`get_metadata`(key)
`get_relations`([label_type])
`get_span`(start, stop)
`get_spans`([label_type])
`get_token`(token_id)
`has_label`(type)
`has_metadata`(key)
`infer_space_after`()	Heuristics in case you wish to infer whitespace_after values for tokenized text.
`is_context_set`()	Determines if this sentence has a context of sentences before or after set.
`left_context`(context_length[, ...])
`next_sentence`()	Get the next sentence in the document.
`previous_sentence`()	Get the previous sentence in the document.
`remove_labels`(typename)
`right_context`(context_length[, ...])
`set_context_for_sentences`(sentences)
`set_embedding`(name, vector)
`set_label`(typename, value[, score])
`to`(device[, pin_memory])
`to_dict`([tag_type])
`to_original_text`()
`to_plain_string`()
`to_tagged_string`([main_label])
`to_tokenized_string`()

Attributes

`embedding`
`end_position`
`labels`
`score`
`start_position`
`tag`
`text`
`unlabeled_identifier`

property unlabeled_identifier#

get_relations(label_type=None)View on GitHub #

Return type:: list[Relation]

get_spans(label_type=None)View on GitHub #

Return type:: list[Span]

get_token(token_id)View on GitHub #

Return type:: Optional[Token]

property embedding#

to(device, pin_memory=False)View on GitHub #

clear_embeddings(embedding_names=None)View on GitHub #

left_context(context_length, respect_document_boundaries=True)View on GitHub #

Return type:: list[Token]

right_context(context_length, respect_document_boundaries=True)View on GitHub #

Return type:: list[Token]

to_tagged_string(main_label=None)View on GitHub #

Return type:: str

property text: str#

to_tokenized_string()View on GitHub #

Return type:: str

to_plain_string()View on GitHub #

Return type:: str

infer_space_after()View on GitHub #

Heuristics in case you wish to infer whitespace_after values for tokenized text.

This is useful for some old NLP tasks (such as CoNLL-03 and CoNLL-2000) that provide only tokenized data with no info of original whitespacing. :return:

to_original_text()View on GitHub #

Return type:: str

to_dict(tag_type=None)View on GitHub #

Return type:: dict[str, Any]

get_span(start, stop)View on GitHub #

Return type:: Span

property start_position: int#

property end_position: int#

get_language_code()View on GitHub #

Return type:: str

next_sentence()View on GitHub #

Get the next sentence in the document.

This only works if context is set through dataloader or elsewhere :return: next Sentence in document if set, otherwise None

previous_sentence()View on GitHub #

Get the previous sentence in the document.

works only if context is set through dataloader or elsewhere :return: previous Sentence in document if set, otherwise None

is_context_set()View on GitHub #

Determines if this sentence has a context of sentences before or after set.

Return True or False depending on whether context is set (for instance in dataloader or elsewhere) :rtype: bool :return: True if context is set, else False

copy_context_from_sentence(sentence)View on GitHub #

Return type:: None

classmethod set_context_for_sentences(sentences)View on GitHub #

Return type:: None

get_labels(label_type=None)View on GitHub #

Returns all labels of this datapoint belonging to a specific annotation layer.

For instance, if a data point has been labeled with “sentiment”-labels, you can call this function as get_labels(“sentiment”) to return a list of all sentiment labels.

Parameters:: typename – The string identifier of the annotation layer, like “sentiment” or “ner”.
Returns:: A list of Label objects belonging to this annotation layer for this data point.

remove_labels(typename)View on GitHub #

Table of Contents

flair.data.Sentence#