flair.tokenization.SciSpacyTokenizer#
- class flair.tokenization.SciSpacyTokenizerView on GitHub#
Bases:
TokenizerTokenizer that uses the en_core_sci_sm Spacy model and some special heuristics.
Implementation of
Tokenizerwhich uses the en_core_sci_sm Spacy model extended by special heuristics to consider characters such as “(”, “)” “-” as additional token separators. The latter distinguishes this implementation fromSpacyTokenizer.Note, you if you want to use the “normal” SciSpacy tokenization just use
SpacyTokenizer.- __init__()View on GitHub#
Methods
__init__()tokenize(text)Attributes
- tokenize(text)View on GitHub#
- Return type:
list[str]
- property name: str#