flair.datasets.treebanks#

class flair.datasets.treebanks.UniversalDependenciesCorpus(data_folder, train_file=None, test_file=None, dev_file=None, in_memory=True, split_multiwords=True)View on GitHub#

Bases: Corpus

__init__(data_folder, train_file=None, test_file=None, dev_file=None, in_memory=True, split_multiwords=True)View on GitHub#

Instantiates a Corpus from CoNLL-U column-formatted task data such as the UD corpora.

Parameters:
  • data_folder (Union[str, Path]) – base folder with the task data

  • train_file – the name of the train file

  • test_file – the name of the test file

  • dev_file – the name of the dev file, if None, dev data is sampled from train

  • in_memory (bool) – If set to True, keeps full dataset in memory, otherwise does disk reads

  • split_multiwords (bool) – If set to True, multiwords are split (default), otherwise kept as single tokens

Returns:

a Corpus with annotated train, dev and test data

class flair.datasets.treebanks.UniversalDependenciesDataset(path_to_conll_file, in_memory=True, split_multiwords=True)View on GitHub#

Bases: FlairDataset

__init__(path_to_conll_file, in_memory=True, split_multiwords=True)View on GitHub#

Instantiates a column dataset in CoNLL-U format.

Parameters:
  • path_to_conll_file (Union[str, Path]) – Path to the CoNLL-U formatted file

  • in_memory (bool) – If set to True, keeps full dataset in memory, otherwise does disk reads

is_in_memory()View on GitHub#
Return type:

bool

class flair.datasets.treebanks.UD_ENGLISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_GALICIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_ANCIENT_GREEK(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_KAZAKH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_OLD_CHURCH_SLAVONIC(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_ARMENIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_ESTONIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_GERMAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_GERMAN_HDT(base_path=None, in_memory=False, split_multiwords=True, revision='dev')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_DUTCH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_FAROESE(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

This treebank includes the Faroese treebank dataset.

The data is obtained from the following link: UniversalDependencies/UD_Faroese-FarPaHC/{revision}

Faronese is a small Western Scandinavian language with 60.000-100.000, related to Icelandic and Old Norse.

class flair.datasets.treebanks.UD_FRENCH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_ITALIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_LATIN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_SPANISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_PORTUGUESE(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_ROMANIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_CATALAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_POLISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_CZECH(base_path=None, in_memory=False, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_SLOVAK(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_SWEDISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_DANISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_NORWEGIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_FINNISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_SLOVENIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_CROATIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_SERBIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_BULGARIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_ARABIC(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_HEBREW(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_TURKISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_UKRAINIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_PERSIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_RUSSIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_HINDI(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_INDONESIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_JAPANESE(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_CHINESE(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_KOREAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_BASQUE(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_CHINESE_KYOTO(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_GREEK(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_NAIJA(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_LIVVI(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_BURYAT(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_NORTH_SAMI(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_MARATHI(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_MALTESE(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_AFRIKAANS(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_GOTHIC(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_OLD_FRENCH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_WOLOF(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_BELARUSIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_COPTIC(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_IRISH(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_LATVIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_LITHUANIAN(base_path=None, in_memory=True, split_multiwords=True, revision='master')View on GitHub#

Bases: UniversalDependenciesCorpus

class flair.datasets.treebanks.UD_BAVARIAN_MAIBAAM(base_path=None, in_memory=True, split_multiwords=True, revision='dev')View on GitHub#

Bases: UniversalDependenciesCorpus