flair.datasets.text_text.GLUE_WNLI#

class flair.datasets.text_text.GLUE_WNLI(label_type='entailment', base_path=None, max_tokens_per_doc=-1, max_chars_per_doc=-1, use_tokenizer=True, in_memory=True, sample_missing_splits=True)View on GitHub #

Bases: DataPairCorpus

__init__(label_type='entailment', base_path=None, max_tokens_per_doc=-1, max_chars_per_doc=-1, use_tokenizer=True, in_memory=True, sample_missing_splits=True)View on GitHub #

Creates a Winograd Schema Challenge Corpus formated as Natural Language Inference task (WNLI).

The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. Additionaly to the Corpus we have a eval_dataset containing the test file of the Glue data. This file contains unlabeled test data to evaluate models on the Glue WNLI task.

Methods

`__init__`([label_type, base_path, ...])	Creates a Winograd Schema Challenge Corpus formated as Natural Language Inference task (WNLI).
`add_label_noise`(label_type, labels[, ...])	Generates uniform label noise distribution in the chosen dataset split.
`downsample`([percentage, downsample_train, ...])	Randomly downsample the corpus to the given percentage (by removing data points).
`filter_empty_sentences`()	A method that filters all sentences consisting of 0 tokens.
`filter_long_sentences`(max_charlength)	A method that filters all sentences for which the plain text is longer than a specified number of characters.
`get_all_sentences`()	Returns all sentences (spanning all three splits) in the `Corpus`.
`get_label_distribution`()	Counts occurrences of each label in the corpus and returns them as a dictionary object.
`make_label_dictionary`(label_type[, ...])	Creates a dictionary of all labels assigned to the sentences in the corpus.
`make_tag_dictionary`(tag_type)	Create a tag dictionary of a given label type.
`make_vocab_dictionary`([max_tokens, min_freq])	Creates a `Dictionary` of all tokens contained in the corpus.
`obtain_statistics`([label_type, pretty_print])	Print statistics about the corpus, including the length of the sentences and the labels in the corpus.
`tsv_from_eval_dataset`(folder_path)

Attributes

`dev`	The dev split as a `torch.utils.data.Dataset` object.
`test`	The test split as a `torch.utils.data.Dataset` object.
`train`	The training split as a `torch.utils.data.Dataset` object.

tsv_from_eval_dataset(folder_path)View on GitHub #

Table of Contents

flair.datasets.text_text.GLUE_WNLI#