flair.tokenization#

`JapaneseTokenizer`	Tokenizer using konoha to support popular japanese tokenizers.
`NoTokenizer`	A dummy tokenizer that performs no tokenization.
`SciSpacyTokenizer`	Tokenizer that uses the en_core_sci_sm Spacy model and some special heuristics.
`SegtokTokenizer`	Tokenizer using segtok, a third party library dedicated to rules-based Indo-European languages.
`SpaceTokenizer`	Tokenizer based on space character only.
`SpacyTokenizer`	Tokenizer using spacy under the hood.
`StaccatoTokenizer`	A string-based tokenizer that splits text into tokens based on the following rules: - Punctuation characters are split into individual tokens - Sequences of numbers are kept together as single tokens - Kanji characters are split into individual tokens - Uninterrupted sequences of letters (Latin, Cyrillic, etc.) and kana are preserved as single tokens - Whitespace and common zero-width characters are ignored.
`Tokenizer`	An abstract class representing a `Tokenizer`.
`TokenizerWrapper`	Helper class to wrap tokenizer functions to the class-based tokenizer interface.