flair.data.Dictionary#
- class flair.data.Dictionary(add_unk=True)View on GitHub#
Bases:
object
This class holds a dictionary that maps strings to unique integer IDs.
Used throughout Flair for representing words, tags, characters, etc. Handles unknown items (<unk>) and flags for multi-label or span tasks. Items are stored internally as bytes for efficiency.
- __init__(add_unk=True)View on GitHub#
Initializes a Dictionary.
- Parameters:
add_unk (bool, optional) – If True, adds a special ‘<unk>’ item. Defaults to True.
Methods
__init__
([add_unk])Initializes a Dictionary.
add_item
(item)Adds a string item to the dictionary.
get_idx_for_item
(item)Retrieves the integer ID for a given string item.
get_idx_for_items
(items)Retrieves the integer IDs for a list of string items.
get_item_for_index
(idx)Retrieves the string item corresponding to a given integer ID.
Returns a list of all items in the dictionary in order of their IDs.
has_item
(item)Checks if a given string item exists in the dictionary.
Checks if the dictionary likely represents BIOES/BIO span labels.
load
(name)Loads a pre-built character dictionary or a dictionary from a file path.
load_from_file
(filename)Loads a Dictionary previously saved using the .save() method.
remove_item
(item)Removes an item from the dictionary.
save
(savefile)Saves the dictionary mapping to a file using pickle.
Adds special <START> and <STOP> tags to the dictionary (often used for CRFs).
Checks if <START> and <STOP> tags have been added.
- remove_item(item)View on GitHub#
Removes an item from the dictionary.
Note: This operation might be slow for large dictionaries as it involves list removal. It currently doesn’t re-index subsequent items.
- Parameters:
item (str) – The string item to remove.
- add_item(item)View on GitHub#
Adds a string item to the dictionary.
If the item exists, returns its ID. Otherwise, adds it and returns the new ID.
- Parameters:
item (str) – The string item to add.
- Returns:
The integer ID of the item.
- Return type:
int
- get_idx_for_item(item)View on GitHub#
Retrieves the integer ID for a given string item.
- Parameters:
item (str) – The string item.
- Returns:
The integer ID. Returns 0 if item is not found and add_unk is True.
- Return type:
int
- Raises:
IndexError – If the item is not found and add_unk is False.
- get_idx_for_items(items)View on GitHub#
Retrieves the integer IDs for a list of string items. (No cache version)
- Return type:
list
[int
]
- get_items()View on GitHub#
Returns a list of all items in the dictionary in order of their IDs.
- Return type:
list
[str
]
- get_item_for_index(idx)View on GitHub#
Retrieves the string item corresponding to a given integer ID.
- Parameters:
idx (int) – The integer ID.
- Returns:
The string item.
- Return type:
str
- Raises:
IndexError – If the index is out of bounds.
- has_item(item)View on GitHub#
Checks if a given string item exists in the dictionary.
- Return type:
bool
- set_start_stop_tags()View on GitHub#
Adds special <START> and <STOP> tags to the dictionary (often used for CRFs).
- Return type:
None
- is_span_prediction_problem()View on GitHub#
Checks if the dictionary likely represents BIOES/BIO span labels.
Returns True if span_labels flag is set or any item starts with ‘B-’, ‘I-’, ‘S-‘.
- Returns:
True if likely span labels, False otherwise.
- Return type:
bool
- start_stop_tags_are_set()View on GitHub#
Checks if <START> and <STOP> tags have been added.
- Return type:
bool
- save(savefile)View on GitHub#
Saves the dictionary mapping to a file using pickle.
- Parameters:
savefile (PathLike) – The path to the output file.
- classmethod load_from_file(filename)View on GitHub#
Loads a Dictionary previously saved using the .save() method.
- Parameters:
filename (Union[str, Path]) – Path to the saved dictionary file.
- Returns:
The loaded Dictionary object.
- Return type:
- classmethod load(name)View on GitHub#
Loads a pre-built character dictionary or a dictionary from a file path.
- Parameters:
name (str) – The name of the pre-built dictionary (e.g., ‘chars’) or a path to a dictionary file.
- Returns:
The loaded Dictionary object.
- Return type:
- Raises:
ValueError – If the name is not recognized or the path is invalid.