flair.datasets.entity_linking#

class flair.datasets.entity_linking.ZELDA(base_path=None, in_memory=False, column_format={0: 'text', 2: 'nel'}, **corpusargs)View on GitHub#

Bases: MultiFileColumnCorpus

class flair.datasets.entity_linking.NEL_ENGLISH_AQUAINT(base_path=None, in_memory=True, agreement_threshold=0.5, sentence_splitter=<flair.splitter.SegtokSentenceSplitter object>, **corpusargs)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.NEL_GERMAN_HIPE(base_path=None, in_memory=True, wiki_language='dewiki', **corpusargs)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.NEL_ENGLISH_AIDA(base_path=None, in_memory=True, use_ids_and_check_existence=False, **corpusargs)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.NEL_ENGLISH_IITB(base_path=None, in_memory=True, ignore_disagreements=False, sentence_splitter=<flair.splitter.SegtokSentenceSplitter object>, **corpusargs)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.NEL_ENGLISH_TWEEKI(base_path=None, in_memory=True, **corpusargs)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.NEL_ENGLISH_REDDIT(base_path=None, in_memory=True, **corpusargs)View on GitHub#

Bases: ColumnCorpus

flair.datasets.entity_linking.from_ufsac_to_tsv(xml_file, conll_file, datasetname, encoding='utf8', cut_multisense=True)View on GitHub#

Function that converts the UFSAC format into tab separated column format in a new file.

Parameters:
  • xml_file (Union[str, Path]) – Path to the xml file.

  • conll_file (Union[str, Path]) – Path for the new conll file.

  • datasetname (str) – Name of the dataset from UFSAC, needed because of different handling of multi-word-spans in the datasets

  • encoding (str, optional) – Encoding used in open function. The default is “utf8”.

  • cut_multisense (bool, optional) – Boolean that determines whether or not the wn30_key tag should be cut if it contains multiple possible senses. If True only the first listed sense will be used. Otherwise the whole list of senses will be detected as one new sense. The default is True.

flair.datasets.entity_linking.determine_tsv_file(filename, data_folder, cut_multisense=True)View on GitHub#

Checks if the converted .tsv file already exists and if not, creates it.

Parameters:
  • filename (str) – The name of the file.

  • data_folder (Path) – The name of the folder in which the CoNLL file should reside.

  • cut_multisense (bool) – Determines whether the wn30_key tag should be cut if it contains multiple possible senses. If True only the first listed sense will be used. Otherwise, the whole list of senses will be detected as one new sense. The default is True.

Return type:

str

Returns:

the name of the file.

class flair.datasets.entity_linking.WSD_UFSAC(filenames=['masc', 'semcor'], base_path=None, in_memory=True, cut_multisense=True, columns={0: 'text', 3: 'sense'}, banned_sentences=None, sample_missing_splits_in_multicorpus=True, sample_missing_splits_in_each_corpus=True, use_raganato_ALL_as_test_data=False, name='multicorpus')View on GitHub#

Bases: MultiCorpus

class flair.datasets.entity_linking.WSD_RAGANATO_ALL(base_path=None, in_memory=True, columns={0: 'text', 3: 'sense'}, label_name_map=None, banned_sentences=None, sample_missing_splits=True, cut_multisense=True)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.WSD_SEMCOR(base_path=None, in_memory=True, columns={0: 'text', 3: 'sense'}, label_name_map=None, banned_sentences=None, sample_missing_splits=True, cut_multisense=True, use_raganato_ALL_as_test_data=False)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.WSD_WORDNET_GLOSS_TAGGED(base_path=None, in_memory=True, columns={0: 'text', 3: 'sense'}, label_name_map=None, banned_sentences=None, sample_missing_splits=True, use_raganato_ALL_as_test_data=False)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.WSD_MASC(base_path=None, in_memory=True, columns={0: 'text', 3: 'sense'}, label_name_map=None, banned_sentences=None, sample_missing_splits=True, cut_multisense=True, use_raganato_ALL_as_test_data=False)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.WSD_OMSTI(base_path=None, in_memory=True, columns={0: 'text', 3: 'sense'}, label_name_map=None, banned_sentences=None, sample_missing_splits=True, cut_multisense=True, use_raganato_ALL_as_test_data=False)View on GitHub#

Bases: ColumnCorpus

class flair.datasets.entity_linking.WSD_TRAINOMATIC(base_path=None, in_memory=True, columns={0: 'text', 3: 'sense'}, label_name_map=None, banned_sentences=None, sample_missing_splits=True, use_raganato_ALL_as_test_data=False)View on GitHub#

Bases: ColumnCorpus