flair.datasets.text_text.DataPairDataset#

class flair.datasets.text_text.DataPairDataset(path_to_data, columns=[0, 1, 2], max_tokens_per_doc=-1, max_chars_per_doc=-1, use_tokenizer=True, in_memory=True, label_type=None, skip_first_line=False, separator='\\t', encoding='utf-8', label=True)View on GitHub #

Bases: FlairDataset

__init__(path_to_data, columns=[0, 1, 2], max_tokens_per_doc=-1, max_chars_per_doc=-1, use_tokenizer=True, in_memory=True, label_type=None, skip_first_line=False, separator='\\t', encoding='utf-8', label=True)View on GitHub #

Creates a Dataset for pairs of sentences/paragraphs.

The file needs to be in a column format, where each line has a column for the first sentence/paragraph, the second sentence/paragraph and the label seperated by e.g. ‘t’ (just like in the glue RTE-dataset https://gluebenchmark.com/tasks) . For each data pair we create a flair.data.DataPair object.

Parameters:

path_to_data (Union[str, Path]) – path to the data file
columns (list[int]) – list of integers that indicate the respective columns. The first entry is the column for the first sentence, the second for the second sentence and the third for the label. Default [0,1,2]
max_tokens_per_doc – If set, shortens sentences to this maximum number of tokens
max_chars_per_doc – If set, shortens sentences to this maximum number of characters
use_tokenizer – Whether to use in-built tokenizer
in_memory (bool) – If True, data will be saved in list of flair.data.DataPair objects, otherwise we use lists with simple strings which needs less space
label_type (Optional[str]) – Name of the label of the data pairs
skip_first_line (bool) – If True, first line of data file will be ignored
separator (str) – Separator between columns in the data file
encoding (str) – Encoding of the data file
label (bool) – If False, the dataset expects unlabeled data

Methods

`__init__`(path_to_data[, columns, ...])	Creates a Dataset for pairs of sentences/paragraphs.
`is_in_memory`()

is_in_memory()View on GitHub #

Return type:: bool

Table of Contents

flair.datasets.text_text.DataPairDataset#