You can think of training as three layers of ability: first continue text, then follow instructions, then give answers that are more helpful.
Pretraining: learning the language world
A common pretraining objective is next-token prediction. The task looks simple, but with enough data and model capacity it forces the model to learn grammar, knowledge, reasoning traces, and domain patterns.
Instruction tuning: from continuation to assistant
A model trained only with pretraining behaves more like a text completer. Instruction tuning gives it examples of Q&A, summarization, classification, code, and reasoning so it learns to complete tasks.
Preference alignment: useful answers, not just plausible ones
The same question can have many possible answers. Preference alignment teaches the model which responses are clearer, safer, and more useful. RLHF and DPO are two representative methods.
One-sentence takeaway
LLM training is not a single magic step. It is a sequence from language modeling, to task following, to alignment with human preferences.