flair.splitter#
- class flair.splitter.SentenceSplitterView on GitHub#
Bases:
ABC
An abstract class representing a
SentenceSplitter
.Sentence splitters are used to represent algorithms and models to split plain text into sentences and individual tokens / words. All subclasses should overwrite
splits()
, which splits the given plain text into a sequence of sentences (Sentence
). The individual sentences are in turn subdivided into tokens / words. In most cases, this can be controlled by passing custom implementation ofTokenizer
.Moreover, subclasses may overwrite
name()
, returning a unique identifier representing the sentence splitter’s configuration.- split(text, link_sentences=True)View on GitHub#
- Return type:
List
[Sentence
]
- property name: str#
- class flair.splitter.SegtokSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
SentenceSplitter
Sentence Splitter using SegTok.
Implementation of
SentenceSplitter
using the SegTok library.For further details see: fnl/segtok
- property name: str#
- class flair.splitter.SpacySentenceSplitter(model, tokenizer=None)View on GitHub#
Bases:
SentenceSplitter
Sentence Splitter using Spacy.
Implementation of
SentenceSplitter
, using models from Spacy.- Parameters:
model (
Union
[Any
,str
]) – Spacy V2 model or the name of the model to load.tokenizer (
Optional
[Tokenizer
]) – Custom tokenizer to use (defaultSpacyTokenizer
)
- property name: str#
- class flair.splitter.SciSpacySentenceSplitterView on GitHub#
Bases:
SpacySentenceSplitter
Sentence splitter using the spacy model en_core_sci_sm.
Convenience class to instantiate
SpacySentenceSplitter
with Spacy model en_core_sci_sm for sentence splitting andSciSpacyTokenizer
as tokenizer.
- class flair.splitter.TagSentenceSplitter(tag, tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
SentenceSplitter
SentenceSplitter which assumes that there is a tag within the text that is used to mark sentence boundaries.
Implementation of
SentenceSplitter
which assumes that there is a special tag within the text that is used to mark sentence boundaries.- property name: str#
- class flair.splitter.NewlineSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
TagSentenceSplitter
Sentence Splitter using newline as boundary marker.
Convenience class to instantiate
SentenceTagSplitter
with newline (”n”) as sentence boundary marker.- property name: str#
- class flair.splitter.NoSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
SentenceSplitter
Sentence Splitter which treats the full text as a single Sentence.
Implementation of
SentenceSplitter
which treats the complete text as one sentence.- property name: str#