flair.data.Token#

class flair.data.Token(text, head_id=None, whitespace_after=1, start_position=0, sentence=None)View on GitHub #

Bases: _PartOfSentence

Represents a single token (word, punctuation) within a Sentence.

form#

The textual content of the token.

Type:: str

idx#

The 1-based index within the sentence (-1 if not attached).

Type:: int

head_id#

1-based index of the dependency head.

Type:: Optional[int]

whitespace_after#

Number of spaces following this token.

Type:: int

start_position#

Character offset where this token begins.

Type:: int

tags_proba_dist#

Stores full probability distributions over tags.

Type:: dict[str, list[Label]]

__init__(text, head_id=None, whitespace_after=1, start_position=0, sentence=None)View on GitHub #

Initializes a Token.

Parameters:

text (str) – The token text.
head_id (Optional[int], optional) – 1-based index of dependency head. Defaults to None.
whitespace_after (int, optional) – Spaces after token. Defaults to 1.
start_position (int, optional) – Character start offset. Defaults to 0.
sentence (Optional[Sentence], optional) – Parent sentence. Defaults to None.

Methods

`__init__`(text[, head_id, whitespace_after, ...])	Initializes a Token.
`add_label`(typename, value[, score])	Adds a label, propagating it to the parent Sentence's layer.
`add_metadata`(key, value)	Adds a key-value pair to the data point's metadata.
`add_tags_proba_dist`(tag_type, tags)	Stores a list of Labels representing a probability distribution for a tag type.
`clear_embeddings`([embedding_names])	Removes stored embeddings to free memory.
`get_each_embedding`([embedding_names])	Retrieves a list of individual embedding tensors.
`get_embedding`([names])	Retrieves embeddings, concatenating if multiple names are given or if names is None.
`get_head`()	Returns the head Token in the dependency parse, if available.
`get_label`([label_type, zero_tag_value])	Retrieves the primary label for a given type, or a default 'O' label.
`get_labels`([typename])	Retrieves all labels for a specific annotation layer.
`get_metadata`(key)	Retrieves metadata associated with the given key.
`get_tags_proba_dist`(tag_type)	Retrieves the stored probability distribution for a given tag type.
`has_label`(typename)	Checks if the data point has at least one label for the given annotation type.
`has_metadata`(key)	Checks if the data point has metadata for the given key.
`remove_labels`(typename)	Removes labels of a type, also removing them from the parent Sentence layer.
`set_embedding`(name, vector)	Stores an embedding tensor under a given name.
`set_label`(typename, value[, score])	Sets a label (overwriting), propagating the change to the parent Sentence.
`to`(device[, pin_memory])	Moves all stored embedding tensors to the specified device.
`to_dict`([tag_type])

Attributes

`embedding`	Returns the concatenated embeddings stored for this token.
`end_position`	Character offset where the token ends (exclusive).
`idx`	The 1-based index within the sentence (-1 if not attached).
`labels`	Returns a list of all labels from all annotation layers.
`score`	Shortcut property for the score of the first label added.
`start_position`	Character offset where the token begins in the Sentence text.
`tag`	Shortcut property for the value of the first label added.
`text`	The text content of the token.
`unlabeled_identifier`	"<text>"'.

property idx: int#: The 1-based index within the sentence (-1 if not attached).

property text: str#: The text content of the token.

property unlabeled_identifier: str#

“<text>”’.

Type:: String identifier
Type:: ‘Token[<idx>]

add_tags_proba_dist(tag_type, tags)View on GitHub #

Stores a list of Labels representing a probability distribution for a tag type.

Parameters:

tag_type (str) – The annotation layer name (e.g., “pos”).
tags (list[Label]) – List of Labels, each with a tag value and probability score.

Return type:

None

get_tags_proba_dist(tag_type)View on GitHub #

Retrieves the stored probability distribution for a given tag type.

Parameters:

tag_type (str) – The annotation layer name.

Returns:

List of Labels representing the distribution,: or empty list if none stored.

Return type:

list[Label]

get_head()View on GitHub #

Returns the head Token in the dependency parse, if available.

Return type:: Optional[Token]

property start_position: int#: Character offset where the token begins in the Sentence text.

property end_position: int#: Character offset where the token ends (exclusive).

property embedding: Tensor#: Returns the concatenated embeddings stored for this token.

add_label(typename, value, score=1.0, **metadata)View on GitHub #: Adds a label, propagating it to the parent Sentence’s layer.

set_label(typename, value, score=1.0, **metadata)View on GitHub #: Sets a label (overwriting), propagating the change to the parent Sentence.

to_dict(tag_type=None)View on GitHub #

Return type:: dict[str, Any]

Table of Contents

flair.data.Token#