flair.data.DataPoint#

class flair.data.DataPointView on GitHub#

Bases: ABC

Abstract base class for all data points in Flair (e.g., Token, Sentence, Image).

Defines core functionalities like holding embeddings, managing labels across different annotation layers, and providing basic positional/textual info.

__init__()View on GitHub#

Initializes a DataPoint with empty annotation/embedding/metadata storage.

Methods

__init__()

Initializes a DataPoint with empty annotation/embedding/metadata storage.

add_label(typename, value[, score])

Adds a new label to a specific annotation layer.

add_metadata(key, value)

Adds a key-value pair to the data point's metadata.

clear_embeddings([embedding_names])

Removes stored embeddings to free memory.

get_each_embedding([embedding_names])

Retrieves a list of individual embedding tensors.

get_embedding([names])

Retrieves embeddings, concatenating if multiple names are given or if names is None.

get_label([label_type, zero_tag_value])

Retrieves the primary label for a given type, or a default 'O' label.

get_labels([typename])

Retrieves all labels for a specific annotation layer.

get_metadata(key)

Retrieves metadata associated with the given key.

has_label(typename)

Checks if the data point has at least one label for the given annotation type.

has_metadata(key)

Checks if the data point has metadata for the given key.

remove_labels(typename)

Removes all labels associated with a specific annotation layer.

set_embedding(name, vector)

Stores an embedding tensor under a given name.

set_label(typename, value[, score])

Sets the label(s) for an annotation layer, overwriting any existing ones.

to(device[, pin_memory])

Moves all stored embedding tensors to the specified device.

Attributes

embedding

Provides the primary embedding representation of the data point.

end_position

The ending character offset (exclusive) within the original text.

labels

Returns a list of all labels from all annotation layers.

score

Shortcut property for the score of the first label added.

start_position

The starting character offset within the original text.

tag

Shortcut property for the value of the first label added.

text

The textual representation of this data point.

unlabeled_identifier

A string identifier for the data point itself, without label info.

abstract property embedding: Tensor#

Provides the primary embedding representation of the data point.

set_embedding(name, vector)View on GitHub#

Stores an embedding tensor under a given name.

Parameters:
  • name (str) – The name to identify this embedding (e.g., “word”, “flair”).

  • vector (torch.Tensor) – The embedding tensor.

get_embedding(names=None)View on GitHub#

Retrieves embeddings, concatenating if multiple names are given or if names is None.

Parameters:

names (Optional[list[str]], optional) – Specific embedding names to retrieve. If None, concatenates all stored embeddings sorted by name. Defaults to None.

Returns:

A single tensor representing the requested embedding(s).

Returns an empty tensor if no relevant embeddings are found.

Return type:

torch.Tensor

get_each_embedding(embedding_names=None)View on GitHub#

Retrieves a list of individual embedding tensors.

Parameters:

embedding_names (Optional[list[str]], optional) – If provided, filters by these names. Otherwise, returns all stored embeddings. Defaults to None.

Returns:

List of embedding tensors, sorted by name.

Return type:

list[torch.Tensor]

to(device, pin_memory=False)View on GitHub#

Moves all stored embedding tensors to the specified device.

Parameters:
  • device (Union[str, torch.device]) – Target device (e.g., ‘cpu’, ‘cuda:0’).

  • pin_memory (bool, optional) – If True and moving to CUDA, attempts to pin memory. Defaults to False.

Return type:

None

clear_embeddings(embedding_names=None)View on GitHub#

Removes stored embeddings to free memory.

Parameters:

embedding_names (Optional[list[str]], optional) – Specific names to remove. If None, removes all embeddings. Defaults to None.

Return type:

None

has_label(typename)View on GitHub#

Checks if the data point has at least one label for the given annotation type.

Return type:

bool

add_metadata(key, value)View on GitHub#

Adds a key-value pair to the data point’s metadata.

Return type:

None

get_metadata(key)View on GitHub#

Retrieves metadata associated with the given key.

Parameters:

key (str) – The metadata key.

Returns:

The metadata value.

Return type:

Any

Raises:

KeyError – If the key is not found.

has_metadata(key)View on GitHub#

Checks if the data point has metadata for the given key.

Return type:

bool

add_label(typename, value, score=1.0, **metadata)View on GitHub#

Adds a new label to a specific annotation layer.

Parameters:
  • typename (str) – Name of the annotation layer (e.g., “ner”, “sentiment”).

  • value (str) – String value of the label (e.g., “PERSON”, “POSITIVE”).

  • score (float, optional) – Confidence score (0.0-1.0). Defaults to 1.0.

  • **metadata – Additional keyword arguments stored as metadata on the Label.

Returns:

Returns self for chaining.

Return type:

DataPoint

set_label(typename, value, score=1.0, **metadata)View on GitHub#

Sets the label(s) for an annotation layer, overwriting any existing ones.

Parameters:
  • typename (str) – The name of the annotation layer.

  • value (str) – The string value of the new label.

  • score (float, optional) – Confidence score (0.0-1.0). Defaults to 1.0.

  • **metadata – Additional keyword arguments for the new Label’s metadata.

Returns:

Returns self for chaining.

Return type:

DataPoint

remove_labels(typename)View on GitHub#

Removes all labels associated with a specific annotation layer.

Parameters:

typename (str) – The name of the annotation layer to clear.

Return type:

None

get_label(label_type=None, zero_tag_value='O')View on GitHub#

Retrieves the primary label for a given type, or a default ‘O’ label.

Parameters:
  • label_type (Optional[str], optional) – The annotation layer name. Defaults to None (uses first overall label).

  • zero_tag_value (str, optional) – Value for the default label if none found. Defaults to “O”.

Returns:

The primary label, or a default label with score 0.0.

Return type:

Label

get_labels(typename=None)View on GitHub#

Retrieves all labels for a specific annotation layer.

Parameters:

typename (Optional[str], optional) – The layer name. If None, returns all labels from all layers. Defaults to None.

Returns:

List of Label objects, or empty list if none found.

Return type:

list[Label]

property labels: list[Label]#

Returns a list of all labels from all annotation layers.

abstract property unlabeled_identifier: str#

A string identifier for the data point itself, without label info.

abstract property start_position: int#

The starting character offset within the original text.

abstract property end_position: int#

The ending character offset (exclusive) within the original text.

abstract property text: str#

The textual representation of this data point.

property tag: str#

Shortcut property for the value of the first label added.

property score: float#

Shortcut property for the score of the first label added.