Basics#

This tutorial explains the basic concepts used in Flair:

You should be familiar with these two concepts in order to get the most out of Flair.

What is a Sentence#

If you want to tag a sentence, you need to first make a Sentence object for it.

For example, say you want to tag the text “The grass is green.”.

Let’s start by making a Sentence object for this sentence.

# The sentence objects holds a sentence that we may want to embed or tag
from flair.data import Sentence

# Make a sentence object by passing a string
sentence = Sentence('The grass is green.')

# Print the object to see what's in there
print(sentence)

This should print:

Sentence[5]: "The grass is green."

The print-out tells us that the sentence consists of 5 tokens.

Note

A token is an atomic unit of the text, often a word or punctuation. The printout is therefore telling us that the sentence “The grass is green.” consists of 5 such atomic units.

Iterating over the tokens in a Sentence#

So what are the 5 tokens in this example sentence?

You can iterate over all tokens in a sentence like this:

for token in sentence:
    print(token)

This should print:

Token[0]: "The"
Token[1]: "grass"
Token[2]: "is"
Token[3]: "green"
Token[4]: "."

This printout is telling us that the 5 tokens in the text are the words “The”, “grass”, “is”, “green”, with a separate token for the full stop at the end. The tokens therefore correspond to the words and the punctuation of the text.

Directly accessing a token#

You can access the tokens of a sentence via their token id or with their index:

# using the token id
print(sentence.get_token(4))
# using the index itself
print(sentence[3])

which should print in both cases

Token[3]: "green"

This print-out includes the token index (3) and the lexical value of the token (“green”).

Tokenization#

When you create a Sentence as above, the text is automatically tokenized (segmented into words) using the segtok library.

Note

You can also use a different tokenizer if you like. To learn more about this, check out our tokenization tutorial.

What is a Label#

All Flair models predict labels. For instance, our sentiment analysis models will predict labels for a sentence. Our NER models will predict labels for tokens in a sentence.

Example 1: Labeling a token in a sentence#

To illustrate how labels work, let’s use the same example sentence as above: “The grass is green.”.

Let us label all “color words” in this sentence. Since the sentence contains only one color word (namely “green”), we only need to add a label to one of the tokens.

We access token 3 in the sentence, and set a label for it:

# Make a sentence object by passing a string
sentence = Sentence('The grass is green.')

# add an NER tag to token 3 in the sentence
sentence[3].add_label('ner', 'color')

# print the sentence (now with this annotation)
print(sentence)

This should print:

Sentence: "The grass is green ." → ["green"/color]

The output indicates that the word “green” in this sentence is labeled as a “color”. You can also iterate through each token and print it to see if it has labels:

for token in sentence:
    print(token)

This should print:

Token[0]: "The"
Token[1]: "grass"
Token[2]: "is"
Token[3]: "green" → color (1.0)
Token[4]: "."

This shows that there are 5 tokens in the sentence, one of which has a label.

Note

The add_label method used here has two mandatory parameters.

Example 2: Labeling a whole sentence#

Sometimes you want to label an entire sentence instead of only a token. Do this by calling add_label for the whole sentence.

For example, say we want to add a sentiment label to the sentence “The grass is green.”:

sentence = Sentence('The grass is green.')

# add a label to a sentence
sentence.add_label('sentiment', 'POSITIVE')

print(sentence)

This should print:

Sentence[5]: "The grass is green." → POSITIVE (1.0)

Indicating that this sentence is now labeled as having a positive sentiment.

Multiple labels#

Importantly, in Flair you can add as many labels to a sentence as you like.

Let’s bring the two examples above together: We will label the sentence “The grass is green.” with an overall positive sentiment, and also add a “color” tag to the token “grass”:

sentence = Sentence('The grass is green.')

# add a sentiment label to the sentence
sentence.add_label('sentiment', 'POSITIVE')

# add an NER tag to token 3 in the sentence
sentence[3].add_label('ner', 'color')

# print the sentence with all annotations
print(sentence)

This will print:

Sentence[5]: "The grass is green." → POSITIVE (1.0) → ["green"/color]

Indicating that the sentence is now labeled with two different types of information.

Accessing labels#

You can iterate through all labels of a sentence using the get_labels() method:

# iterate over all labels and print
for label in sentence.get_labels():
    print(label)

This will get each label and print it. For instance, let’s re-use the previous example in which we add two different labels to the same sentence:

sentence = Sentence('The grass is green.')

# add a sentiment label to the sentence
sentence.add_label('sentiment', 'POSITIVE')

# add an NER tag to token 3 in the sentence
sentence[3].add_label('ner', 'color')

# iterate over all labels and print
for label in sentence.get_labels():
    print(label)

This will now print the following two lines:

Sentence[5]: "The grass is green." → POSITIVE (1.0)
Token[3]: "green" → color (1.0)

This printout tells us that there are two labels: The first is for the whole sentence, tagged as POSITIVE. The second is only for the token “green”, tagged as “color”.

Note

If you only want to iterate over labels of a specific type, add the label name as parameter to get_labels(). For instance, to only iterate over all NER labels, do:

# iterate over all NER labels only
for label in sentence.get_labels('ner'):
    print(label)

Information for each label#

Each label is of class Label which next to the value has a score indicating confidence. It also has a pointer back to the data point to which it attaches.

This means that you can print the value, the confidence and the labeled text of each label:

sentence = Sentence('The grass is green.')

# add an NER tag to token 3 in the sentence
sentence[3].add_label('ner', 'color')

# iterate over all labels and print
for label in sentence.get_labels():

    # Print the text, the label value and the label score
    print(f'"{label.data_point.text}" is classified as "{label.value}" with score {label.score}')

This should print:

"green" is classified as "color" with score 1.0

Our color tag has a score of 1.0 since we manually added it. If a tag is predicted by our sequence labeler, the score value will indicate classifier confidence.