flair.splitter.SentenceSplitter#

class flair.splitter.SentenceSplitterView on GitHub#

Bases: ABC

An abstract class representing a SentenceSplitter.

Sentence splitters are used to represent algorithms and models to split plain text into sentences and individual tokens / words. All subclasses should overwrite split(), which splits the given plain text into a list of flair.data.Sentence objects. The individual sentences are in turn subdivided into tokens. In most cases, this can be controlled by passing custom implementation of flair.tokenization.Tokenizer.

Moreover, subclasses may overwrite name(), returning a unique identifier representing the sentence splitter’s configuration.

The most common class in Flair that implements this base class is SegtokSentenceSplitter.

__init__()#

Methods

__init__()

split(text[, link_sentences])

Takes as input a text as a plain string and outputs a list of flair.data.Sentence objects.

Attributes

name

A string identifier of the sentence splitter.

tokenizer

The flair.tokenization.Tokenizer class used to tokenize sentences after they are split.

split(text, link_sentences=True)View on GitHub#

Takes as input a text as a plain string and outputs a list of flair.data.Sentence objects.

If link_sentences is set (by default, it is). The flair.data.Sentence objects will include pointers to the preceding and following sentences in the original text. This way, the original sequence information will always be preserved.

Parameters:
  • text (str) – The plain text to split.

  • link_sentences (bool) – If set to True, flair.data.Sentence objects will include pointers to the preceding and following sentences in the original text.

Return type:

list[Sentence]

Returns:

A list of flair.data.Sentence objects that each represent one sentence in the given text.

property name: str#

A string identifier of the sentence splitter.

property tokenizer: Tokenizer#

The flair.tokenization.Tokenizer class used to tokenize sentences after they are split.