flair.tokenization.SciSpacyTokenizer#

class flair.tokenization.SciSpacyTokenizerView on GitHub#

Bases: Tokenizer

Tokenizer that uses the en_core_sci_sm Spacy model and some special heuristics.

Implementation of Tokenizer which uses the en_core_sci_sm Spacy model extended by special heuristics to consider characters such as “(”, “)” “-” as additional token separators. The latter distinguishes this implementation from SpacyTokenizer.

Note, you if you want to use the “normal” SciSpacy tokenization just use SpacyTokenizer.

__init__()View on GitHub#

Methods

__init__()

tokenize(text)

Attributes

name

tokenize(text)View on GitHub#
Return type:

list[str]

property name: str#