flair.data.Token#

class flair.data.Token(text, head_id=None, whitespace_after=1, start_position=0, sentence=None)View on GitHub#

Bases: _PartOfSentence

Represents a single token (word, punctuation) within a Sentence.

form#

The textual content of the token.

Type:

str

idx#

The 1-based index within the sentence (-1 if not attached).

Type:

int

head_id#

1-based index of the dependency head.

Type:

Optional[int]

whitespace_after#

Number of spaces following this token.

Type:

int

start_position#

Character offset where this token begins.

Type:

int

tags_proba_dist#

Stores full probability distributions over tags.

Type:

dict[str, list[Label]]

__init__(text, head_id=None, whitespace_after=1, start_position=0, sentence=None)View on GitHub#

Initializes a Token.

Parameters:
  • text (str) – The token text.

  • head_id (Optional[int], optional) – 1-based index of dependency head. Defaults to None.

  • whitespace_after (int, optional) – Spaces after token. Defaults to 1.

  • start_position (int, optional) – Character start offset. Defaults to 0.

  • sentence (Optional[Sentence], optional) – Parent sentence. Defaults to None.

Methods

__init__(text[, head_id, whitespace_after, ...])

Initializes a Token.

add_label(typename, value[, score])

Adds a label, propagating it to the parent Sentence's layer.

add_metadata(key, value)

Adds a key-value pair to the data point's metadata.

add_tags_proba_dist(tag_type, tags)

Stores a list of Labels representing a probability distribution for a tag type.

clear_embeddings([embedding_names])

Removes stored embeddings to free memory.

get_each_embedding([embedding_names])

Retrieves a list of individual embedding tensors.

get_embedding([names])

Retrieves embeddings, concatenating if multiple names are given or if names is None.

get_head()

Returns the head Token in the dependency parse, if available.

get_label([label_type, zero_tag_value])

Retrieves the primary label for a given type, or a default 'O' label.

get_labels([typename])

Retrieves all labels for a specific annotation layer.

get_metadata(key)

Retrieves metadata associated with the given key.

get_tags_proba_dist(tag_type)

Retrieves the stored probability distribution for a given tag type.

has_label(typename)

Checks if the data point has at least one label for the given annotation type.

has_metadata(key)

Checks if the data point has metadata for the given key.

remove_labels(typename)

Removes labels of a type, also removing them from the parent Sentence layer.

set_embedding(name, vector)

Stores an embedding tensor under a given name.

set_label(typename, value[, score])

Sets a label (overwriting), propagating the change to the parent Sentence.

to(device[, pin_memory])

Moves all stored embedding tensors to the specified device.

to_dict([tag_type])

Attributes

embedding

Returns the concatenated embeddings stored for this token.

end_position

Character offset where the token ends (exclusive).

idx

The 1-based index within the sentence (-1 if not attached).

labels

Returns a list of all labels from all annotation layers.

score

Shortcut property for the score of the first label added.

start_position

Character offset where the token begins in the Sentence text.

tag

Shortcut property for the value of the first label added.

text

The text content of the token.

unlabeled_identifier

"<text>"'.

property idx: int#

The 1-based index within the sentence (-1 if not attached).

property text: str#

The text content of the token.

property unlabeled_identifier: str#

“<text>”’.

Type:

String identifier

Type:

‘Token[<idx>]

add_tags_proba_dist(tag_type, tags)View on GitHub#

Stores a list of Labels representing a probability distribution for a tag type.

Parameters:
  • tag_type (str) – The annotation layer name (e.g., “pos”).

  • tags (list[Label]) – List of Labels, each with a tag value and probability score.

Return type:

None

get_tags_proba_dist(tag_type)View on GitHub#

Retrieves the stored probability distribution for a given tag type.

Parameters:

tag_type (str) – The annotation layer name.

Returns:

List of Labels representing the distribution,

or empty list if none stored.

Return type:

list[Label]

get_head()View on GitHub#

Returns the head Token in the dependency parse, if available.

Return type:

Optional[Token]

property start_position: int#

Character offset where the token begins in the Sentence text.

property end_position: int#

Character offset where the token ends (exclusive).

property embedding: Tensor#

Returns the concatenated embeddings stored for this token.

add_label(typename, value, score=1.0, **metadata)View on GitHub#

Adds a label, propagating it to the parent Sentence’s layer.

set_label(typename, value, score=1.0, **metadata)View on GitHub#

Sets a label (overwriting), propagating the change to the parent Sentence.

to_dict(tag_type=None)View on GitHub#
Return type:

dict[str, Any]