flair.data.Sentence#

class flair.data.Sentence(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub#

Bases: DataPoint

A Sentence is a central object in Flair that represents either a single sentence or a whole text.

Internally, it consists of a list of Token objects that represent each word in the text. Additionally, this object stores all metadata related to a text such as labels, language code, etc.

__init__(text, use_tokenizer=True, language_code=None, start_position=0)View on GitHub#

Create a sentence object by passing either a text or a list of tokens.

Parameters:
  • text (Union[str, list[str], list[Token]]) – Either pass the text as a string, or provide an already tokenized text as either a list of strings or a list of Token objects.

  • use_tokenizer (Union[bool, Tokenizer]) – You can optionally specify a custom tokenizer to split the text into tokens. By default we use flair.tokenization.SegtokTokenizer. If use_tokenizer is set to False, flair.tokenization.SpaceTokenizer will be used instead. The tokenizer will be ignored, if text refers to pretokenized tokens.

  • language_code (Optional[str]) – Language of the sentence. If not provided, langdetect will be called when the language_code is accessed for the first time.

  • start_position (int) – Start char offset of the sentence in the superordinate document.

Methods

__init__(text[, use_tokenizer, ...])

Create a sentence object by passing either a text or a list of tokens.

add_label(typename, value[, score])

Adds a label to the DataPoint by internally creating a Label object.

add_metadata(key, value)

clear_embeddings([embedding_names])

copy_context_from_sentence(sentence)

get_each_embedding([embedding_names])

get_embedding([names])

get_label([label_type, zero_tag_value])

get_labels([label_type])

Returns all labels of this datapoint belonging to a specific annotation layer.

get_language_code()

get_metadata(key)

get_relations([label_type])

get_span(start, stop)

get_spans([label_type])

get_token(token_id)

has_label(type)

has_metadata(key)

infer_space_after()

Heuristics in case you wish to infer whitespace_after values for tokenized text.

is_context_set()

Determines if this sentence has a context of sentences before or after set.

left_context(context_length[, ...])

next_sentence()

Get the next sentence in the document.

previous_sentence()

Get the previous sentence in the document.

remove_labels(typename)

right_context(context_length[, ...])

set_context_for_sentences(sentences)

set_embedding(name, vector)

set_label(typename, value[, score])

to(device[, pin_memory])

to_dict([tag_type])

to_original_text()

to_plain_string()

to_tagged_string([main_label])

to_tokenized_string()

Attributes

embedding

end_position

labels

score

start_position

tag

text

unlabeled_identifier

property unlabeled_identifier#
get_relations(label_type=None)View on GitHub#
Return type:

list[Relation]

get_spans(label_type=None)View on GitHub#
Return type:

list[Span]

get_token(token_id)View on GitHub#
Return type:

Optional[Token]

property embedding#
to(device, pin_memory=False)View on GitHub#
clear_embeddings(embedding_names=None)View on GitHub#
left_context(context_length, respect_document_boundaries=True)View on GitHub#
Return type:

list[Token]

right_context(context_length, respect_document_boundaries=True)View on GitHub#
Return type:

list[Token]

to_tagged_string(main_label=None)View on GitHub#
Return type:

str

property text: str#
to_tokenized_string()View on GitHub#
Return type:

str

to_plain_string()View on GitHub#
Return type:

str

infer_space_after()View on GitHub#

Heuristics in case you wish to infer whitespace_after values for tokenized text.

This is useful for some old NLP tasks (such as CoNLL-03 and CoNLL-2000) that provide only tokenized data with no info of original whitespacing. :return:

to_original_text()View on GitHub#
Return type:

str

to_dict(tag_type=None)View on GitHub#
Return type:

dict[str, Any]

get_span(start, stop)View on GitHub#
Return type:

Span

property start_position: int#
property end_position: int#
get_language_code()View on GitHub#
Return type:

str

next_sentence()View on GitHub#

Get the next sentence in the document.

This only works if context is set through dataloader or elsewhere :return: next Sentence in document if set, otherwise None

previous_sentence()View on GitHub#

Get the previous sentence in the document.

works only if context is set through dataloader or elsewhere :return: previous Sentence in document if set, otherwise None

is_context_set()View on GitHub#

Determines if this sentence has a context of sentences before or after set.

Return True or False depending on whether context is set (for instance in dataloader or elsewhere) :rtype: bool :return: True if context is set, else False

copy_context_from_sentence(sentence)View on GitHub#
Return type:

None

classmethod set_context_for_sentences(sentences)View on GitHub#
Return type:

None

get_labels(label_type=None)View on GitHub#

Returns all labels of this datapoint belonging to a specific annotation layer.

For instance, if a data point has been labeled with “sentiment”-labels, you can call this function as get_labels(“sentiment”) to return a list of all sentiment labels.

Parameters:

typename – The string identifier of the annotation layer, like “sentiment” or “ner”.

Returns:

A list of Label objects belonging to this annotation layer for this data point.

remove_labels(typename)View on GitHub#