flair.datasets.treebanks#
- class flair.datasets.treebanks.UniversalDependenciesCorpus(data_folder, train_file=None, test_file=None, dev_file=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
Corpus
- __init__(data_folder, train_file=None, test_file=None, dev_file=None, in_memory=True, split_multiwords=True)View on GitHub#
Instantiates a Corpus from CoNLL-U column-formatted task data such as the UD corpora.
- Parameters:
data_folder (
Union
[str
,Path
]) – base folder with the task datatrain_file – the name of the train file
test_file – the name of the test file
dev_file – the name of the dev file, if None, dev data is sampled from train
in_memory (
bool
) – If set to True, keeps full dataset in memory, otherwise does disk readssplit_multiwords (
bool
) – If set to True, multiwords are split (default), otherwise kept as single tokens
- Returns:
a Corpus with annotated train, dev and test data
- class flair.datasets.treebanks.UniversalDependenciesDataset(path_to_conll_file, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
FlairDataset
- __init__(path_to_conll_file, in_memory=True, split_multiwords=True)View on GitHub#
Instantiates a column dataset in CoNLL-U format.
- Parameters:
path_to_conll_file (
Union
[str
,Path
]) – Path to the CoNLL-U formatted filein_memory (
bool
) – If set to True, keeps full dataset in memory, otherwise does disk reads
- is_in_memory()View on GitHub#
- Return type:
bool
- class flair.datasets.treebanks.UD_ENGLISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_GALICIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_ANCIENT_GREEK(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_KAZAKH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_OLD_CHURCH_SLAVONIC(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_ARMENIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_ESTONIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_GERMAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_GERMAN_HDT(base_path=None, in_memory=False, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_DUTCH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_FAROESE(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
This treebank includes the Faroese treebank dataset.
The data is obtained from the following link: UniversalDependencies/UD_Faroese-FarPaHC
Faronese is a small Western Scandinavian language with 60.000-100.000, related to Icelandic and Old Norse.
- class flair.datasets.treebanks.UD_FRENCH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_ITALIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_LATIN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_SPANISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_PORTUGUESE(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_ROMANIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_CATALAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_POLISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_CZECH(base_path=None, in_memory=False, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_SLOVAK(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_SWEDISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_DANISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_NORWEGIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_FINNISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_SLOVENIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_CROATIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_SERBIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_BULGARIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_ARABIC(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_HEBREW(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_TURKISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_UKRAINIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_PERSIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_RUSSIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_HINDI(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_INDONESIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_JAPANESE(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_CHINESE(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_KOREAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_BASQUE(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_CHINESE_KYOTO(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_GREEK(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_NAIJA(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_LIVVI(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_BURYAT(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_NORTH_SAMI(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_MARATHI(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_MALTESE(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_AFRIKAANS(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_GOTHIC(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_OLD_FRENCH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_WOLOF(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_BELARUSIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_COPTIC(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_IRISH(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_LATVIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus
- class flair.datasets.treebanks.UD_LITHUANIAN(base_path=None, in_memory=True, split_multiwords=True)View on GitHub#
Bases:
UniversalDependenciesCorpus