flair.data.DataPoint#

class flair.data.DataPointView on GitHub #

Bases: ABC

Abstract base class for all data points in Flair (e.g., Token, Sentence, Image).

Defines core functionalities like holding embeddings, managing labels across different annotation layers, and providing basic positional/textual info.

__init__()View on GitHub #: Initializes a DataPoint with empty annotation/embedding/metadata storage.

Methods

`__init__`()	Initializes a DataPoint with empty annotation/embedding/metadata storage.
`add_label`(typename, value[, score])	Adds a new label to a specific annotation layer.
`add_metadata`(key, value)	Adds a key-value pair to the data point's metadata.
`clear_embeddings`([embedding_names])	Removes stored embeddings to free memory.
`get_each_embedding`([embedding_names])	Retrieves a list of individual embedding tensors.
`get_embedding`([names])	Retrieves embeddings, concatenating if multiple names are given or if names is None.
`get_label`([label_type, zero_tag_value])	Retrieves the primary label for a given type, or a default 'O' label.
`get_labels`([typename])	Retrieves all labels for a specific annotation layer.
`get_metadata`(key)	Retrieves metadata associated with the given key.
`has_label`(typename)	Checks if the data point has at least one label for the given annotation type.
`has_metadata`(key)	Checks if the data point has metadata for the given key.
`remove_labels`(typename)	Removes all labels associated with a specific annotation layer.
`set_embedding`(name, vector)	Stores an embedding tensor under a given name.
`set_label`(typename, value[, score])	Sets the label(s) for an annotation layer, overwriting any existing ones.
`to`(device[, pin_memory])	Moves all stored embedding tensors to the specified device.

Attributes

`embedding`	Provides the primary embedding representation of the data point.
`end_position`	The ending character offset (exclusive) within the original text.
`labels`	Returns a list of all labels from all annotation layers.
`score`	Shortcut property for the score of the first label added.
`start_position`	The starting character offset within the original text.
`tag`	Shortcut property for the value of the first label added.
`text`	The textual representation of this data point.
`unlabeled_identifier`	A string identifier for the data point itself, without label info.

abstract property embedding: Tensor#: Provides the primary embedding representation of the data point.

set_embedding(name, vector)View on GitHub #

Stores an embedding tensor under a given name.

Parameters:

name (str) – The name to identify this embedding (e.g., “word”, “flair”).
vector (torch.Tensor) – The embedding tensor.

get_embedding(names=None)View on GitHub #

Retrieves embeddings, concatenating if multiple names are given or if names is None.

Parameters:

names (Optional[list[str]], optional) – Specific embedding names to retrieve. If None, concatenates all stored embeddings sorted by name. Defaults to None.

Returns:

A single tensor representing the requested embedding(s).: Returns an empty tensor if no relevant embeddings are found.

Return type:

torch.Tensor

get_each_embedding(embedding_names=None)View on GitHub #

Retrieves a list of individual embedding tensors.

Parameters:: embedding_names (Optional[list[str]], optional) – If provided, filters by these names. Otherwise, returns all stored embeddings. Defaults to None.
Returns:: List of embedding tensors, sorted by name.
Return type:: list[torch.Tensor]

to(device, pin_memory=False)View on GitHub #

Moves all stored embedding tensors to the specified device.

Parameters:

device (Union[str, torch.device]) – Target device (e.g., ‘cpu’, ‘cuda:0’).
pin_memory (bool, optional) – If True and moving to CUDA, attempts to pin memory. Defaults to False.

Return type:

None

clear_embeddings(embedding_names=None)View on GitHub #

Removes stored embeddings to free memory.

Parameters:: embedding_names (Optional[list[str]], optional) – Specific names to remove. If None, removes all embeddings. Defaults to None.
Return type:: None

has_label(typename)View on GitHub #

Checks if the data point has at least one label for the given annotation type.

Return type:: bool

add_metadata(key, value)View on GitHub #

Adds a key-value pair to the data point’s metadata.

Return type:: None

get_metadata(key)View on GitHub #

Retrieves metadata associated with the given key.

Parameters:: key (str) – The metadata key.
Returns:: The metadata value.
Return type:: Any
Raises:: KeyError – If the key is not found.

has_metadata(key)View on GitHub #

Checks if the data point has metadata for the given key.

Return type:: bool

add_label(typename, value, score=1.0, **metadata)View on GitHub #

Adds a new label to a specific annotation layer.

Parameters:

typename (str) – Name of the annotation layer (e.g., “ner”, “sentiment”).
value (str) – String value of the label (e.g., “PERSON”, “POSITIVE”).
score (float, optional) – Confidence score (0.0-1.0). Defaults to 1.0.
**metadata – Additional keyword arguments stored as metadata on the Label.

Returns:

Returns self for chaining.

Return type:

DataPoint

set_label(typename, value, score=1.0, **metadata)View on GitHub #

Sets the label(s) for an annotation layer, overwriting any existing ones.

Parameters:

typename (str) – The name of the annotation layer.
value (str) – The string value of the new label.
score (float, optional) – Confidence score (0.0-1.0). Defaults to 1.0.
**metadata – Additional keyword arguments for the new Label’s metadata.

Returns:

Returns self for chaining.

Return type:

DataPoint

remove_labels(typename)View on GitHub #

Removes all labels associated with a specific annotation layer.

Parameters:: typename (str) – The name of the annotation layer to clear.
Return type:: None

get_label(label_type=None, zero_tag_value='O')View on GitHub #

Retrieves the primary label for a given type, or a default ‘O’ label.

Parameters:

label_type (Optional[str], optional) – The annotation layer name. Defaults to None (uses first overall label).
zero_tag_value (str, optional) – Value for the default label if none found. Defaults to “O”.

Returns:

The primary label, or a default label with score 0.0.

Return type:

Label

get_labels(typename=None)View on GitHub #

Retrieves all labels for a specific annotation layer.

Parameters:: typename (Optional[str], optional) – The layer name. If None, returns all labels from all layers. Defaults to None.
Returns:: List of Label objects, or empty list if none found.
Return type:: list[Label]

property labels: list[Label]#: Returns a list of all labels from all annotation layers.

abstract property unlabeled_identifier: str#: A string identifier for the data point itself, without label info.

abstract property start_position: int#: The starting character offset within the original text.

abstract property end_position: int#: The ending character offset (exclusive) within the original text.

abstract property text: str#: The textual representation of this data point.

property tag: str#: Shortcut property for the value of the first label added.

property score: float#: Shortcut property for the score of the first label added.

Table of Contents

flair.data.DataPoint#