flair.splitter.SentenceSplitter#
- class flair.splitter.SentenceSplitterView on GitHub#
Bases:
ABC
An abstract class representing a
SentenceSplitter
.Sentence splitters are used to represent algorithms and models to split plain text into sentences and individual tokens / words. All subclasses should overwrite
split()
, which splits the given plain text into a list offlair.data.Sentence
objects. The individual sentences are in turn subdivided into tokens. In most cases, this can be controlled by passing custom implementation offlair.tokenization.Tokenizer
.Moreover, subclasses may overwrite
name()
, returning a unique identifier representing the sentence splitter’s configuration.The most common class in Flair that implements this base class is
SegtokSentenceSplitter
.- __init__()#
Methods
__init__
()split
(text[, link_sentences])Takes as input a text as a plain string and outputs a list of
flair.data.Sentence
objects.Attributes
A string identifier of the sentence splitter.
The
flair.tokenization.Tokenizer
class used to tokenize sentences after they are split.- split(text, link_sentences=True)View on GitHub#
Takes as input a text as a plain string and outputs a list of
flair.data.Sentence
objects.If link_sentences is set (by default, it is). The
flair.data.Sentence
objects will include pointers to the preceding and following sentences in the original text. This way, the original sequence information will always be preserved.- Parameters:
text (str) – The plain text to split.
link_sentences (bool) – If set to True,
flair.data.Sentence
objects will include pointers to the preceding and following sentences in the original text.
- Return type:
list
[Sentence
]- Returns:
A list of
flair.data.Sentence
objects that each represent one sentence in the given text.
- property name: str#
A string identifier of the sentence splitter.
- property tokenizer: Tokenizer#
The
flair.tokenization.Tokenizer
class used to tokenize sentences after they are split.