Prompt hierarchy: what `system` actually does (KP-042 explained)
A public sample of a knowledge-point deep dive. Why system prompts are not just 'instructions that come first', and why getting this wrong wastes most of your context window.
KP-042 is one of the most-mis-mastered concepts in our bank. On the surface, a system prompt looks like a user message that simply happens to come first. It is not. The model treats it differently, weights it differently, and resists overriding it differently. Understanding precisely how is the difference between a prompt that holds under pressure and one that gets argued out of its job by a determined user.
The short version: the system prompt sets a behavioural prior that persists for the whole conversation, while user and assistant turns are interpreted within that prior. When a user message contradicts the system prompt, the model is biased to honour the system prompt unless the contradiction is extreme. That bias is a feature, not a bug, and it is what makes guardrails practical at all.
Where this matters in practice is context budgeting. Putting policy into the user message means re-paying for it on every turn and re-defending it against every adversarial input. Putting it in the system prompt means paying for it once and inheriting the model's natural weighting. KP-042 is one concept we will not retire below 0.92, because nearly every Domain 2 scenario depends on it.