flair.splitter#
- class flair.splitter.SentenceSplitterView on GitHub#
Bases:
ABCAn abstract class representing a
SentenceSplitter.Sentence splitters are used to represent algorithms and models to split plain text into sentences and individual tokens / words. All subclasses should overwrite
splits(), which splits the given plain text into a sequence of sentences (Sentence). The individual sentences are in turn subdivided into tokens / words. In most cases, this can be controlled by passing custom implementation ofTokenizer.Moreover, subclasses may overwrite
name(), returning a unique identifier representing the sentence splitter’s configuration.- split(text, link_sentences=True)View on GitHub#
- Return type:
List[Sentence]
- property name: str#
- class flair.splitter.SegtokSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
SentenceSplitterSentence Splitter using SegTok.
Implementation of
SentenceSplitterusing the SegTok library.For further details see: fnl/segtok
- property name: str#
- class flair.splitter.SpacySentenceSplitter(model, tokenizer=None)View on GitHub#
Bases:
SentenceSplitterSentence Splitter using Spacy.
Implementation of
SentenceSplitter, using models from Spacy.- Parameters:
model (
Union[Any,str]) – Spacy V2 model or the name of the model to load.tokenizer (
Optional[Tokenizer]) – Custom tokenizer to use (defaultSpacyTokenizer)
- property name: str#
- class flair.splitter.SciSpacySentenceSplitterView on GitHub#
Bases:
SpacySentenceSplitterSentence splitter using the spacy model en_core_sci_sm.
Convenience class to instantiate
SpacySentenceSplitterwith Spacy model en_core_sci_sm for sentence splitting andSciSpacyTokenizeras tokenizer.
- class flair.splitter.TagSentenceSplitter(tag, tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
SentenceSplitterSentenceSplitter which assumes that there is a tag within the text that is used to mark sentence boundaries.
Implementation of
SentenceSplitterwhich assumes that there is a special tag within the text that is used to mark sentence boundaries.- property name: str#
- class flair.splitter.NewlineSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
TagSentenceSplitterSentence Splitter using newline as boundary marker.
Convenience class to instantiate
SentenceTagSplitterwith newline (”n”) as sentence boundary marker.- property name: str#
- class flair.splitter.NoSentenceSplitter(tokenizer=<flair.tokenization.SegtokTokenizer object>)View on GitHub#
Bases:
SentenceSplitterSentence Splitter which treats the full text as a single Sentence.
Implementation of
SentenceSplitterwhich treats the complete text as one sentence.- property name: str#