LLM Train and Inference vs Right Padding and Left Padding

Padding method will affect the performance of LLM. There are two padding method: left and right padding. As to LLM, what padding method we should use? In this tutorial, we will discuss this topic.

LLM training process – right padding

When we are training a LLM model, we should use right padding. Because some tokenizers are using absolute position id, which make the 0 id is not pad symbol.

https://github.com/huggingface/transformers/issues/664

How do I add padding in GTP2?
I get something like this when I add zero in front to pad the sequences, but then I found out that 0 is actually not “[PAD]” but “!”.
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1639, 481]

https://huggingface.co/docs/transformers/model_doc/gpt2

GPT-2 is a model with absolute position embeddings so it’s usually advised to pad the inputs on the right rather than the left.

Actually, we can use left and right padding in some llm training process as long as we make correct mask. Otherwise, you may get poor performance. However, we recommend you to use right padding when training.

LLM inference process – left padding

Differ from llm training process, we must use left padding in llm inference process. Here is an explain.

Understand padding_side with Examples in LLM – LLM Tutorial

However, if you do not use batch inference, you do not need to use left padding.