flair.splitter#

class flair.splitter.SentenceSplitterView on GitHub#

Bases: ABC

An abstract class representing a SentenceSplitter.

Sentence splitters are used to represent algorithms and models to split plain text into sentences and individual tokens / words. All subclasses should overwrite splits(), which splits the given plain text into a sequence of sentences (Sentence). The individual sentences are in turn subdivided into tokens / words. In most cases, this can be controlled by passing custom implementation of Tokenizer.

Moreover, subclasses may overwrite name(), returning a unique identifier representing the sentence splitter’s configuration.

split(text, link_sentences=True)View on GitHub#
Return type:

List[Sentence]

property name: str#
property tokenizer: Tokenizer#
class flair.splitter.SegtokSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#

Bases: SentenceSplitter

Sentence Splitter using SegTok.

Implementation of SentenceSplitter using the SegTok library.

For further details see: fnl/segtok

property name: str#
property tokenizer: Tokenizer#
class flair.splitter.SpacySentenceSplitter(model, tokenizer=None)View on GitHub#

Bases: SentenceSplitter

Sentence Splitter using Spacy.

Implementation of SentenceSplitter, using models from Spacy.

Parameters:
  • model (Union[Any, str]) – Spacy V2 model or the name of the model to load.

  • tokenizer (Optional[Tokenizer]) – Custom tokenizer to use (default SpacyTokenizer)

property tokenizer: Tokenizer#
property name: str#
class flair.splitter.SciSpacySentenceSplitterView on GitHub#

Bases: SpacySentenceSplitter

Sentence splitter using the spacy model en_core_sci_sm.

Convenience class to instantiate SpacySentenceSplitter with Spacy model en_core_sci_sm for sentence splitting and SciSpacyTokenizer as tokenizer.

class flair.splitter.TagSentenceSplitter(tag, tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#

Bases: SentenceSplitter

SentenceSplitter which assumes that there is a tag within the text that is used to mark sentence boundaries.

Implementation of SentenceSplitter which assumes that there is a special tag within the text that is used to mark sentence boundaries.

property tokenizer: Tokenizer#
property name: str#
class flair.splitter.NewlineSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#

Bases: TagSentenceSplitter

Sentence Splitter using newline as boundary marker.

Convenience class to instantiate SentenceTagSplitter with newline (”n”) as sentence boundary marker.

property name: str#
class flair.splitter.NoSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#

Bases: SentenceSplitter

Sentence Splitter which treats the full text as a single Sentence.

Implementation of SentenceSplitter which treats the complete text as one sentence.

property tokenizer: Tokenizer#
property name: str#