flair.data.Dictionary#

class flair.data.Dictionary(add_unk=True)View on GitHub#

Bases: object

This class holds a dictionary that maps strings to unique integer IDs.

Used throughout Flair for representing words, tags, characters, etc. Handles unknown items (<unk>) and flags for multi-label or span tasks. Items are stored internally as bytes for efficiency.

__init__(add_unk=True)View on GitHub#

Initializes a Dictionary.

Parameters:

add_unk (bool, optional) – If True, adds a special ‘<unk>’ item. Defaults to True.

Methods

__init__([add_unk])

Initializes a Dictionary.

add_item(item)

Adds a string item to the dictionary.

get_idx_for_item(item)

Retrieves the integer ID for a given string item.

get_idx_for_items(items)

Retrieves the integer IDs for a list of string items.

get_item_for_index(idx)

Retrieves the string item corresponding to a given integer ID.

get_items()

Returns a list of all items in the dictionary in order of their IDs.

has_item(item)

Checks if a given string item exists in the dictionary.

is_span_prediction_problem()

Checks if the dictionary likely represents BIOES/BIO span labels.

load(name)

Loads a pre-built character dictionary or a dictionary from a file path.

load_from_file(filename)

Loads a Dictionary previously saved using the .save() method.

remove_item(item)

Removes an item from the dictionary.

save(savefile)

Saves the dictionary mapping to a file using pickle.

set_start_stop_tags()

Adds special <START> and <STOP> tags to the dictionary (often used for CRFs).

start_stop_tags_are_set()

Checks if <START> and <STOP> tags have been added.

remove_item(item)View on GitHub#

Removes an item from the dictionary.

Note: This operation might be slow for large dictionaries as it involves list removal. It currently doesn’t re-index subsequent items.

Parameters:

item (str) – The string item to remove.

add_item(item)View on GitHub#

Adds a string item to the dictionary.

If the item exists, returns its ID. Otherwise, adds it and returns the new ID.

Parameters:

item (str) – The string item to add.

Returns:

The integer ID of the item.

Return type:

int

get_idx_for_item(item)View on GitHub#

Retrieves the integer ID for a given string item.

Parameters:

item (str) – The string item.

Returns:

The integer ID. Returns 0 if item is not found and add_unk is True.

Return type:

int

Raises:

IndexError – If the item is not found and add_unk is False.

get_idx_for_items(items)View on GitHub#

Retrieves the integer IDs for a list of string items. (No cache version)

Return type:

list[int]

get_items()View on GitHub#

Returns a list of all items in the dictionary in order of their IDs.

Return type:

list[str]

get_item_for_index(idx)View on GitHub#

Retrieves the string item corresponding to a given integer ID.

Parameters:

idx (int) – The integer ID.

Returns:

The string item.

Return type:

str

Raises:

IndexError – If the index is out of bounds.

has_item(item)View on GitHub#

Checks if a given string item exists in the dictionary.

Return type:

bool

set_start_stop_tags()View on GitHub#

Adds special <START> and <STOP> tags to the dictionary (often used for CRFs).

Return type:

None

is_span_prediction_problem()View on GitHub#

Checks if the dictionary likely represents BIOES/BIO span labels.

Returns True if span_labels flag is set or any item starts with ‘B-’, ‘I-’, ‘S-‘.

Returns:

True if likely span labels, False otherwise.

Return type:

bool

start_stop_tags_are_set()View on GitHub#

Checks if <START> and <STOP> tags have been added.

Return type:

bool

save(savefile)View on GitHub#

Saves the dictionary mapping to a file using pickle.

Parameters:

savefile (PathLike) – The path to the output file.

classmethod load_from_file(filename)View on GitHub#

Loads a Dictionary previously saved using the .save() method.

Parameters:

filename (Union[str, Path]) – Path to the saved dictionary file.

Returns:

The loaded Dictionary object.

Return type:

Dictionary

classmethod load(name)View on GitHub#

Loads a pre-built character dictionary or a dictionary from a file path.

Parameters:

name (str) – The name of the pre-built dictionary (e.g., ‘chars’) or a path to a dictionary file.

Returns:

The loaded Dictionary object.

Return type:

Dictionary

Raises:

ValueError – If the name is not recognized or the path is invalid.