Biomedical text analysis
HunFlair2 is a state-of-the-art named entity tagger and linker for biomedical texts, implemented in Flair (starting version 0.14.0).
It performs two types of analysis:
The NER model identifies genes/proteins, chemicals, diseases, species and cell lines in text, and outperforms other biomedical NER tool on unseen corpora.
The linking model goes one step beyond detecting entity names. It can find their unique identifier in biomedical knowledge bases, thus normalizing names.
This section provides examples for each of these models.
Example 1: Biomedical Named Entity Recognition
Let's run named entity recognition (NER) over an example sentence. All you need to do is make a Sentence, load a pre-trained model and use it to predict tags for the sentence:
from flair.data import Sentence
from flair.nn import Classifier
# make a sentence
sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome")
# load biomedical NER tagger
tagger = Classifier.load("biomed")
# tag sentence
tagger.predict(sentence)
Done! The Sentence
now has entity annotations. Let's print the entities found by the tagger:
for entity in sentence.get_labels():
print(entity)
This should print:
Span[0:2]: "Behavioral abnormalities" → Disease (1.0)
Span[4:5]: "Fmr1" → Gene (1.0)
Span[6:7]: "Mouse" → Species (1.0)
Span[9:12]: "Fragile X Syndrome" → Disease (1.0)
Example 2: Biomedical Entity Linking
For improved integration and aggregation from multiple different documents linking / normalizing the entities to standardized ontologies or knowledge bases is required. Let's perform entity normalization by using specialized models per entity type:
from flair.data import Sentence
from flair.models import EntityMentionLinker
from flair.nn import Classifier
# make a sentence
sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome")
# load biomedical NER tagger + predict entities
tagger = Classifier.load("biomed")
tagger.predict(sentence)
# load gene linker and perform normalization
gene_linker = EntityMentionLinker.load("gene-linker")
gene_linker.predict(sentence)
# load disease linker and perform normalization
disease_linker = EntityMentionLinker.load("disease-linker")
disease_linker.predict(sentence)
# load species linker and perform normalization
species_linker = EntityMentionLinker.load("species-linker")
species_linker.predict(sentence)
The ontologies and knowledge bases used are pre-processed the first time the normalisation is executed, which might takes a certain amount of time. All further calls are then based on this pre-processing and run much faster.
Done! The Sentence now has entity normalizations. Let's print the entity identifiers found by the linkers:
for entity in sentence.get_labels("link"):
print(entity)
This should print:
Span[0:2]: "Behavioral abnormalities" → MESH:D001523/name=Mental Disorders (197.9467010498047)
Span[4:5]: "Fmr1" → 108684022/name=FRAXA (219.9510040283203)
Span[6:7]: "Mouse" → 10090/name=Mus musculus (213.6201934814453)
Span[9:12]: "Fragile X Syndrome" → MESH:D005600/name=Fragile X Syndrome (193.7115020751953)
Note that for best performance of the linking models, you need to additionally install pyab3p. This library is not installed by default with Flair, and does not work on all operating systems. Without it, the linking models will still work, just not quite as well as when the library is installed.