flair.tokenization.SciSpacyTokenizer#

class flair.tokenization.SciSpacyTokenizerView on GitHub#

Bases: Tokenizer

Tokenizer that uses the en_core_sci_sm Spacy model and some special heuristics.

Implementation of Tokenizer which uses the en_core_sci_sm Spacy model extended by special heuristics to consider characters such as “(”, “)” “-” as additional token separators. The latter distinguishes this implementation from SpacyTokenizer.

Note, you if you want to use the “normal” SciSpacy tokenization just use SpacyTokenizer.

__init__()View on GitHub#

Methods

__init__()

from_dict(config)

Instantiate the tokenizer from a configuration dictionary.

to_dict()

Serialize the tokenizer's configuration to a dictionary.

tokenize(text)

Attributes

name

tokenize(text)View on GitHub#
Return type:

list[str]

property name: str#
to_dict()View on GitHub#

Serialize the tokenizer’s configuration to a dictionary.

Return type:

dict

classmethod from_dict(config)View on GitHub#

Instantiate the tokenizer from a configuration dictionary.

Return type:

SciSpacyTokenizer