When we are finetuning a LLM, we may get this error: use_cache=True is incompatible with gradient checkpointing. In this tutorial, we will introduce you how to fix it.
What is this error?
This error likes below:
How to fix this error?
First, we should set use_cache = False when loading LLM model.
For example:
model = AutoModelForCausalLM.from_pretrained( args.model_name_or_path, device_map=device_map, load_in_4bit=True, torch_dtype=torch.float16, trust_remote_code=True, use_cache = False, quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, ), )
Secondly, if you have installed flash-attn, you can uninstall it.
Then, you can find this error is fixed.