# Basics This tutorial explains the basic concepts used in Flair: - what is a [`Sentence`](#flair.data.Sentence) - what is a [`Label`](#flair.data.Label) You should be familiar with these two concepts in order to get the most out of Flair. ## What is a Sentence If you want to tag a sentence, you need to first make a [`Sentence`](#flair.data.Sentence) object for it. For example, say you want to tag the text "_The grass is green._". Let's start by making a [`Sentence`](#flair.data.Sentence) object for this sentence. ```python # The sentence objects holds a sentence that we may want to embed or tag from flair.data import Sentence # Make a sentence object by passing a string sentence = Sentence('The grass is green.') # Print the object to see what's in there print(sentence) ``` This should print: ```console Sentence[5]: "The grass is green." ``` The print-out tells us that the sentence consists of 5 tokens. ```{note} A token is an atomic unit of the text, often a word or punctuation. The printout is therefore telling us that the sentence "_The grass is green._" consists of 5 such atomic units. ``` ### Iterating over the tokens in a Sentence So what are the 5 tokens in this example sentence? You can iterate over all tokens in a sentence like this: ```python for token in sentence: print(token) ``` This should print: ```console Token[0]: "The" Token[1]: "grass" Token[2]: "is" Token[3]: "green" Token[4]: "." ``` This printout is telling us that the 5 tokens in the text are the words "_The_", "_grass_", "_is_", "_green_", with a separate token for the full stop at the end. The tokens therefore correspond to the words and the punctuation of the text. ### Directly accessing a token You can access the tokens of a sentence via their token id or with their index: ```python # using the token id print(sentence.get_token(4)) # using the index itself print(sentence[3]) ``` which should print in both cases ```console Token[3]: "green" ``` This print-out includes the token index (3) and the lexical value of the token ("green"). ### Tokenization When you create a [`Sentence`](#flair.data.Sentence) as above, the text is automatically tokenized (segmented into words) using the [segtok](https://pypi.org/project/segtok/) library. ```{note} You can also use a different tokenizer if you like. To learn more about this, check out our tokenization tutorial. ``` ## What is a Label All Flair models predict labels. For instance, our sentiment analysis models will predict labels for a sentence. Our NER models will predict labels for tokens in a sentence. ### Example 1: Labeling a token in a sentence To illustrate how labels work, let's use the same example sentence as above: "_The grass is green._". Let us label all "color words" in this sentence. Since the sentence contains only one color word (namely "green"), we only need to add a label to one of the tokens. We access token 3 in the sentence, and set a label for it: ```python # Make a sentence object by passing a string sentence = Sentence('The grass is green.') # add an NER tag to token 3 in the sentence sentence[3].add_label('ner', 'color') # print the sentence (now with this annotation) print(sentence) ``` This should print: ```console Sentence: "The grass is green ." → ["green"/color] ``` The output indicates that the word "green" in this sentence is labeled as a "color". You can also iterate through each token and print it to see if it has labels: ```python for token in sentence: print(token) ``` This should print: ```console Token[0]: "The" Token[1]: "grass" Token[2]: "is" Token[3]: "green" → color (1.0) Token[4]: "." ``` This shows that there are 5 tokens in the sentence, one of which has a label. ```{note} The [`add_label`](#flair.data.DataPoint.add_label) method used here has two mandatory parameters. ``` ### Example 2: Labeling a whole sentence Sometimes you want to label an entire sentence instead of only a token. Do this by calling [`add_label`](#flair.data.DataPoint.add_label) for the whole sentence. For example, say we want to add a sentiment label to the sentence "_The grass is green._": ```python sentence = Sentence('The grass is green.') # add a label to a sentence sentence.add_label('sentiment', 'POSITIVE') print(sentence) ``` This should print: ``` Sentence[5]: "The grass is green." → POSITIVE (1.0) ``` Indicating that this sentence is now labeled as having a positive sentiment. ### Multiple labels Importantly, in Flair you can add as many labels to a sentence as you like. Let's bring the two examples above together: We will label the sentence "_The grass is green._" with an overall positive sentiment, and also add a "color" tag to the token "grass": ```python sentence = Sentence('The grass is green.') # add a sentiment label to the sentence sentence.add_label('sentiment', 'POSITIVE') # add an NER tag to token 3 in the sentence sentence[3].add_label('ner', 'color') # print the sentence with all annotations print(sentence) ``` This will print: ``` Sentence[5]: "The grass is green." → POSITIVE (1.0) → ["green"/color] ``` Indicating that the sentence is now labeled with two different types of information. ### Accessing labels You can iterate through all labels of a sentence using the [`get_labels()`](#flair.data.Sentence.get_labels) method: ```python # iterate over all labels and print for label in sentence.get_labels(): print(label) ``` This will get each label and print it. For instance, let's re-use the previous example in which we add two different labels to the same sentence: ```python sentence = Sentence('The grass is green.') # add a sentiment label to the sentence sentence.add_label('sentiment', 'POSITIVE') # add an NER tag to token 3 in the sentence sentence[3].add_label('ner', 'color') # iterate over all labels and print for label in sentence.get_labels(): print(label) ``` This will now print the following two lines: ``` Sentence[5]: "The grass is green." → POSITIVE (1.0) Token[3]: "green" → color (1.0) ``` This printout tells us that there are two labels: The first is for the whole sentence, tagged as POSITIVE. The second is only for the token "green", tagged as "color". ````{note} If you only want to iterate over labels of a specific type, add the label name as parameter to [`get_labels()`](#flair.data.Sentence.get_labels). For instance, to only iterate over all NER labels, do: ```python # iterate over all NER labels only for label in sentence.get_labels('ner'): print(label) ``` ```` ### Information for each label Each label is of class `Label` which next to the value has a score indicating confidence. It also has a pointer back to the data point to which it attaches. This means that you can print the value, the confidence and the labeled text of each label: ```python sentence = Sentence('The grass is green.') # add an NER tag to token 3 in the sentence sentence[3].add_label('ner', 'color') # iterate over all labels and print for label in sentence.get_labels(): # Print the text, the label value and the label score print(f'"{label.data_point.text}" is classified as "{label.value}" with score {label.score}') ``` This should print: ``` "green" is classified as "color" with score 1.0 ``` Our color tag has a score of 1.0 since we manually added it. If a tag is predicted by our sequence labeler, the score value will indicate classifier confidence.