# Embeddings This tutorial shows you how to use Flair to produce **embeddings** for words and documents. Embeddings are vector representations that are useful for a variety of reasons. All Flair models are trained on top of embeddings, so if you want to train your own models, you should understand how embeddings work. ## Example 1: Embeddings Words with Transformers Let's use a standard BERT model (bert-base-uncased) to embed the sentence "the grass is green". Simply instantiate [`TransformerWordEmbeddings`](#flair.embeddings.token.TransformerWordEmbeddings) and call [`embed()`](#flair.embeddings.base.Embeddings.embed) over an example sentence: ```python from flair.embeddings import TransformerWordEmbeddings from flair.data import Sentence # init embedding embedding = TransformerWordEmbeddings('bert-base-uncased') # create a sentence sentence = Sentence('The grass is green .') # embed words in sentence embedding.embed(sentence) ``` This will cause **each word in the sentence** to be embedded. You can iterate through the words and get each embedding like this: ```python # now check out the embedded tokens. for token in sentence: print(token) print(token.embedding) ``` This will print each token as a long PyTorch vector: ```console Token[0]: "The" tensor([-0.0323, -0.3904, -1.1946, 0.1296, 0.5806, ..], device='cuda:0') Token[1]: "grass" tensor([-0.3973, 0.2652, -0.1337, 0.4473, 1.1641, ..], device='cuda:0') Token[2]: "is" tensor([ 0.1374, -0.3688, -0.8292, -0.4068, 0.7717, ..], device='cuda:0') Token[3]: "green" tensor([-0.7722, -0.1152, 0.3661, 0.3570, 0.6573, ..], device='cuda:0') Token[4]: "." tensor([ 0.1441, -0.1772, -0.5911, 0.2236, -0.0497, ..], device='cuda:0') ``` *(Output truncated for readability, actually the vectors are much longer.)* Transformer word embeddings are the most important concept in Flair. Check out more info in this dedicated chapter. ## Example 2: Embeddings Documents with Transformers Sometimes you want to have an **embedding for a whole document**, not only individual words. In this case, use one of the DocumentEmbeddings classes in Flair. Let's again use a standard BERT model to get an embedding for the entire sentence "the grass is green": ```python from flair.embeddings import TransformerDocumentEmbeddings from flair.data import Sentence # init embedding embedding = TransformerDocumentEmbeddings('bert-base-uncased') # create a sentence sentence = Sentence('The grass is green .') # embed words in sentence embedding.embed(sentence) ``` Now, the whole sentence is embedded. Print the embedding like this: ```python # now check out the embedded sentence print(sentence.embedding) ``` [`TransformerDocumentEmbeddings`](#flair.embeddings.document.TransformerDocumentEmbeddings) are the most important concept in Flair. Check out more info in [this](project:transformer-embeddings.md) dedicated chapter. ## How to Stack Embeddings Flair allows you to combine embeddings into "embedding stacks". When not fine-tuning, using combinations of embeddings often gives best results! Use the [`StackedEmbeddings`](#flair.embeddings.token.StackedEmbeddings) class and instantiate it by passing a list of embeddings that you wish to combine. For instance, lets combine classic GloVe [`WordEmbeddings`](#flair.embeddings.token.WordEmbeddings) with forward and backward [`FlairEmbeddings`](#flair.embeddings.token.FlairEmbeddings). First, instantiate the two embeddings you wish to combine: ```python from flair.embeddings import WordEmbeddings, FlairEmbeddings # init standard GloVe embedding glove_embedding = WordEmbeddings('glove') # init Flair forward and backwards embeddings flair_embedding_forward = FlairEmbeddings('news-forward') flair_embedding_backward = FlairEmbeddings('news-backward') ``` Now instantiate the [`StackedEmbeddings`](#flair.embeddings.token.StackedEmbeddings) class and pass it a list containing these two embeddings. ```python from flair.embeddings import StackedEmbeddings # create a StackedEmbedding object that combines glove and forward/backward flair embeddings stacked_embeddings = StackedEmbeddings([ glove_embedding, flair_embedding_forward, flair_embedding_backward, ]) ``` That's it! Now just use this embedding like all the other embeddings, i.e. call the [`embed()`](#flair.embeddings.base.Embeddings.embed) method over your sentences. ```python sentence = Sentence('The grass is green .') # just embed a sentence using the StackedEmbedding as you would with any single embedding. stacked_embeddings.embed(sentence) # now check out the embedded tokens. for token in sentence: print(token) print(token.embedding) ``` Words are now embedded using a concatenation of three different embeddings. This means that the resulting embedding vector is still a single PyTorch vector.