Fix A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer

When we are using a llm model to infer, we may get this warning: A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side=’left’ when initializing the tokenizer. In this tutorial, we will introduce you how to fix.

This warning is below:

How to fix?

First, we should set padding_side=’left’ when initializing the tokenizer, which is very important if you plan to make a batch inference.

For example:

LLM Train and Inference vs Right Padding and Left Padding

However, we find this warning is still existing.

We can set our tokenizer padding side is left, which is correct for inference. Then, we can hide this warning.

Add code below in your script.

from transformers.utils import logging
logging.get_logger("transformers").setLevel(logging.ERROR)

Finally, you will find this warning is disappeared.