flair.tokenization.SciSpacyTokenizer#
- class flair.tokenization.SciSpacyTokenizerView on GitHub#
Bases:
Tokenizer
Tokenizer that uses the en_core_sci_sm Spacy model and some special heuristics.
Implementation of
Tokenizer
which uses the en_core_sci_sm Spacy model extended by special heuristics to consider characters such as “(”, “)” “-” as additional token separators. The latter distinguishes this implementation fromSpacyTokenizer
.Note, you if you want to use the “normal” SciSpacy tokenization just use
SpacyTokenizer
.- __init__()View on GitHub#
Methods
__init__
()from_dict
(config)Instantiate the tokenizer from a configuration dictionary.
to_dict
()Serialize the tokenizer's configuration to a dictionary.
tokenize
(text)Attributes
- tokenize(text)View on GitHub#
- Return type:
list
[str
]
- property name: str#
- to_dict()View on GitHub#
Serialize the tokenizer’s configuration to a dictionary.
- Return type:
dict
- classmethod from_dict(config)View on GitHub#
Instantiate the tokenizer from a configuration dictionary.
- Return type: