In this tutorial, we will discuss if we have saved a model object using torch.save(), can we change our model structure before we plan to load this saved model?
Most of LLMs are decoder-only architectures, which means they are not trained to continue from pad tokens. This strategy may cause wrong outputs when batch inference.