flair.datasets.entity_linking.from_ufsac_to_tsv#
- flair.datasets.entity_linking.from_ufsac_to_tsv(xml_file, conll_file, datasetname, encoding='utf8', cut_multisense=True)View on GitHub#
Function that converts the UFSAC format into tab separated column format in a new file.
- Parameters:
xml_file (Union[str, Path]) – Path to the xml file.
conll_file (Union[str, Path]) – Path for the new conll file.
datasetname (str) – Name of the dataset from UFSAC, needed because of different handling of multi-word-spans in the datasets
encoding (str, optional) – Encoding used in open function. The default is “utf8”.
cut_multisense (bool, optional) – Boolean that determines whether or not the wn30_key tag should be cut if it contains multiple possible senses. If True only the first listed sense will be used. Otherwise the whole list of senses will be detected as one new sense. The default is True.