flair.datasets.document_classification#

AGNEWS

The AG's News Topic Classification Corpus, classifying news into 4 coarse-grained topics.

AMAZON_REVIEWS

A very large corpus of Amazon reviews with positivity ratings.

COMMUNICATIVE_FUNCTIONS

The Communicative Functions Classification Corpus.

CSVClassificationCorpus

Classification corpus instantiated from CSV data files.

CSVClassificationDataset

Dataset for text classification from CSV column formatted data.

ClassificationCorpus

A classification corpus from FastText-formatted text files.

ClassificationDataset

Dataset for classification instantiated from a single FastText-formatted file.

GERMEVAL_2018_OFFENSIVE_LANGUAGE

GermEval 2018 corpus for identification of offensive language.

GLUE_COLA

Corpus of Linguistic Acceptability from GLUE benchmark.

GLUE_SST2

GO_EMOTIONS

GoEmotions dataset containing 58k Reddit comments labeled with 27 emotion categories.

IMDB

Corpus of IMDB movie reviews labeled by sentiment (POSITIVE, NEGATIVE).

NEWSGROUPS

20 newsgroups corpus, classifying news items into one of 20 categories.

SENTEVAL_CR

The customer reviews dataset of SentEval, classified into NEGATIVE or POSITIVE sentiment.

SENTEVAL_MPQA

The opinion-polarity dataset of SentEval, classified into NEGATIVE or POSITIVE polarity.

SENTEVAL_MR

The movie reviews dataset of SentEval, classified into NEGATIVE or POSITIVE sentiment.

SENTEVAL_SST_BINARY

The Stanford sentiment treebank dataset of SentEval, classified into NEGATIVE or POSITIVE sentiment.

SENTEVAL_SST_GRANULAR

The Stanford sentiment treebank dataset of SentEval, classified into 5 sentiment classes.

SENTEVAL_SUBJ

The subjectivity dataset of SentEval, classified into SUBJECTIVE or OBJECTIVE sentiment.

SENTIMENT_140

Twitter sentiment corpus.

STACKOVERFLOW

Stackoverflow corpus classifying questions into one of 20 labels.

TREC_50

The TREC Question Classification Corpus, classifying questions into 50 fine-grained answer types.

TREC_6

The TREC Question Classification Corpus, classifying questions into 6 coarse-grained answer types.

WASSA_ANGER

WASSA-2017 anger emotion-intensity corpus.

WASSA_FEAR

WASSA-2017 fear emotion-intensity corpus.

WASSA_JOY

WASSA-2017 joy emotion-intensity dataset corpus.

WASSA_SADNESS

WASSA-2017 sadness emotion-intensity corpus.

YAHOO_ANSWERS

The YAHOO Question Classification Corpus, classifying questions into 10 coarse-grained answer types.

_download_wassa_if_not_there