flair.splitter.SentenceSplitter#
- class flair.splitter.SentenceSplitterView on GitHub#
Bases:
ABCAn abstract class representing a
SentenceSplitter.Sentence splitters are used to represent algorithms and models to split plain text into sentences and individual tokens / words. All subclasses should overwrite
split(), which splits the given plain text into a list offlair.data.Sentenceobjects. The individual sentences are in turn subdivided into tokens. In most cases, this can be controlled by passing custom implementation offlair.tokenization.Tokenizer.Moreover, subclasses may overwrite
name(), returning a unique identifier representing the sentence splitter’s configuration.The most common class in Flair that implements this base class is
SegtokSentenceSplitter.- __init__()#
Methods
__init__()split(text[, link_sentences])Takes as input a text as a plain string and outputs a list of
flair.data.Sentenceobjects.Attributes
A string identifier of the sentence splitter.
The
flair.tokenization.Tokenizerclass used to tokenize sentences after they are split.- split(text, link_sentences=True)View on GitHub#
Takes as input a text as a plain string and outputs a list of
flair.data.Sentenceobjects.If link_sentences is set (by default, it is). The
flair.data.Sentenceobjects will include pointers to the preceding and following sentences in the original text. This way, the original sequence information will always be preserved.- Parameters:
text (str) – The plain text to split.
link_sentences (bool) – If set to True,
flair.data.Sentenceobjects will include pointers to the preceding and following sentences in the original text.
- Return type:
list[Sentence]- Returns:
A list of
flair.data.Sentenceobjects that each represent one sentence in the given text.
- property name: str#
A string identifier of the sentence splitter.
- property tokenizer: Tokenizer#
The
flair.tokenization.Tokenizerclass used to tokenize sentences after they are split.