Prompt templates and variables: build once, reuse forever
A prompt template is the difference between a tool you use once and a tool you operate reliably. Write a bare instruction from scratch and you rewrite it every time, with no guarantee the next version behaves like the last. Build a template with named variable slots, a fixed output contract, and a self-critique loop baked in, and you get a durable asset that a team can share, version-pin, and improve without guesswork. This guide teaches the Variable-Slot Checklist (the centerpiece named asset), the four-layer anatomy, and the RAILS framework. It is part of our complete prompt engineering guide.
- The core move: separate what changes run-to-run (variable slots) from what should never change (hardcoded structure, rules, guardrails). One template, infinite inputs.
- The highest-leverage technique: append a self-scoring rubric and a "revise if below threshold" clause. This is the self-critique loop, and most practitioners never use it.
- The named asset: the Variable-Slot Checklist tells you exactly which inputs to parameterize and which to hardcode. That decision is the whole discipline in one table.
- The framework: RAILS covers every structural component a reusable prompt needs: Role, Architecture, Instructions, Loop, Safety.
Table of contents
What exactly is a prompt template?
A prompt template is a reusable instruction skeleton in which the parts that change across runs are marked as named variable slots and the parts that define the purpose, format, and quality bar are fixed. When you swap in concrete values for the slots, you get a fully specified, runnable prompt without rewriting the structure from scratch.
The closest analogy is a SQL query with bind parameters: the query structure captures your intent; the bound values supply the data. The same principle explains why OpenAI's prompt engineering documentation and Anthropic's prompt engineering overview both emphasize separating instruction from input: it is the first structural step that makes a prompt composable and testable.
A bare instruction looks like this:
"Write a blog post introduction about remote work for software developers."
A template version looks like this:
Write a blog post introduction about {{topic}} for {{audience}}. Voice: {{voice}} Word count: {{word_count}} Output format: {{output_format}}
The template runs on a request about productivity tools for designers just as easily as one about remote work for developers, without rewriting a single structural decision. That is the payoff: one well-built template replaces dozens of one-off rewrites.
What is the RAILS framework for prompt templates?
RAILS is a five-component structure that every reusable prompt should satisfy. Each letter names one obligation. Satisfy all five and you have a prompt that is specific, structured, self-correcting, and honest. Miss one and you get a prompt that degrades unpredictably as inputs vary.
What are the four layers of a reusable prompt?
The RAILS framework covers the rules. The four-layer anatomy covers the structure: where those rules live inside the prompt object and why each layer exists. Think of a reusable prompt less as a string and more as a function with a typed signature.
The Variable-Slot Checklist: what to parameterize versus hardcode
This is the centerpiece named asset. The Variable-Slot Checklist answers one question that practitioners get wrong constantly: when should an input be a variable slot, and when should it be hardcoded into the template? Get this wrong and your template is either too brittle (any new input breaks it) or too generic (the variable inputs are doing the structural work the template should do).
The decision rule is this: if an input changes across legitimate uses of the template, it is a slot. If changing it would break the template's purpose or violate its quality contract, it is hardcoded. Here is the full checklist.
| Input type | Decision | Rationale | Example slot or hardcode |
|---|---|---|---|
| The topic, subject, or problem being analyzed | PARAMETERIZE | This is the entire point of reuse. Different runs address different topics. | {{topic}} |
| The target audience or reader persona | PARAMETERIZE | The same template for writing explainers should work for a technical and a non-technical audience without rewriting the structure. | {{audience}} |
| Voice, tone, or brand persona | PARAMETERIZE | A shared template across a team will need to serve different brand voices. Put the voice description in a slot so it can be swapped without touching the rules. | {{voice}} |
| Background material or reference text | PARAMETERIZE | Source documents, prior drafts, product specs, or data tables are inputs, not structural decisions. | {{context}} or {{source_text}} |
| Output schema or format variant | PARAMETERIZE (with caution) | Only parameterize format if you genuinely need multiple output shapes from one template. If the format is always the same, hardcode it: a fixed contract is easier to test. | {{output_format}} |
| The role persona and named competence | HARDCODE | The role is a structural decision about what kind of expertise the template activates. If the role changes, the template's purpose changes. Make a new template. | Fixed in Layer 1 |
| The ban-list of forbidden patterns | HARDCODE | Forbidden patterns like "no em-dashes, no invented statistics, no passive-voice openings" are quality commitments, not inputs. They must not be overridable by a slot value. | Fixed in Layer 4 |
| The self-critique rubric and revision threshold | HARDCODE | The rubric is the quality gate. If it were a slot, users could accidentally omit it and drop back to one-shot generation without knowing. | Fixed at end of Layer 4 |
| Anti-fabrication and evidence-type rules | HARDCODE | These are safety-critical. A template that sometimes asks for evidence and sometimes does not is worse than one that never does. | Fixed in Layer 4 |
| The output section headings or JSON keys | HARDCODE (usually) | A stable output contract is what makes the template testable. If keys vary, downstream code that reads the output breaks. Only parameterize when you have a documented set of valid schemas. | Fixed in Layer 3 |
Why does the self-critique loop change everything?
The self-critique loop is the single most underused high-leverage move in prompt engineering. The concept is simple: at the end of your prompt, append a rubric that names the criteria the output will be scored on, then add an explicit instruction to revise and re-score if the score falls below a stated threshold.
The mechanism is grounded in findings on self-consistency and chain-of-thought reasoning (Wei et al., 2022, NeurIPS; Wang et al., 2023, ICLR), which showed that models improve on complex tasks when they reason step by step and can revise their reasoning. The self-critique loop is a lightweight operationalization of that principle: instead of generating multiple outputs and selecting the best, the model generates one output, evaluates it against your stated criteria, and revises in place if the evaluation is below threshold. No external scoring pipeline required.
Most practitioners write a prompt, read the output, and iterate manually. That manual iteration is real work, and it happens entirely outside the model. The self-critique loop moves that iteration inside the generation itself, making it zero-cost and consistent across every run.
Here is a rubric block you can drop into any RAILS-structured template. Adapt the criteria names and the threshold to your domain.
- Slop density: are there filler phrases ("it's worth noting", "in today's world", passive-voice openers)? Target: 0 per 500 words. Score 0-10.
- Example density: does each claim carry a concrete worked example or specific number? Target: at least one per major claim. Score 0-10.
- Argument clarity: can each paragraph's main claim be extracted as a single sentence without ambiguity? Score 0-10.
- Ban-list compliance: check every item on the ban-list explicitly. Score 0 (any violation) or 10 (clean).
Threshold: 8 or above on every criterion. If any score falls below 8, revise the output to address that specific failure, then re-score. Deliver only the final revised output and the final scores. Do not explain the revision process.
Notice what the rubric does not do: it does not ask the model to explain itself, philosophize about quality, or produce a long self-analysis. It asks for a score against named, measurable criteria and a targeted fix if any score is low. The instruction "deliver only the final revised output and the final scores" strips the scaffolding so the user receives a clean output, not a meta-commentary.
Before and after: what parameterization actually looks like
The most reliable way to internalize the Variable-Slot Checklist is to see a real prompt before and after the transformation. The pair below covers a common use case: a prompt that writes a competitor comparison summary for a B2B product team.
The "before" version is a one-off instruction. It produces one specific output and is useless the moment the product or competitor changes. The "after" version is a RAILS-structured template with variable slots, a fixed output contract, a ban-list, and a self-critique loop. It can run on any product, any competitor, any audience, and any voice without structural rewriting.
Our tool and Asana are both solid choices for project management, but they differ in a few key ways. When it comes to pricing, our platform offers a competitive rate that makes it accessible for small teams, while Asana's free plan has some limitations that may require upgrading sooner than expected...
[Output continues with vague claims, no numbers, no structure, brand voice undefined. Cannot be reused for a different product or competitor without a full rewrite.]
One-sentence verdict: Our tool costs 40 percent less per seat than Asana Business while matching its core task-tracking features for teams under 50.
Pricing comparison: Our product: $8 per seat per month (annual). Asana Business: $13.49 per seat per month (annual) [verified against asana.com/pricing, June 2026].
Three concrete advantages: (1) Unlimited automations on the $8 tier vs Asana's 250-automation cap on Business. (2) Native time-tracking without a third-party integration. (3) Guest seats included at no charge vs Asana's per-guest fee on Business.
Honest limitation: Asana's reporting dashboards are more configurable out of the box; our advanced reporting requires a third-party integration or the Enterprise tier.
Scores: Specificity 9 / Ban-list 10 / Verdict clarity 9. All above threshold. Delivering final output.
When should you promote a prompt to a reusable template?
The promotion trigger is the third run. The first time you write a prompt for a task, you do not know yet whether the task is recurring. The second time, you suspect it is. By the third time you are rewriting the same basic instruction with slightly different inputs, you have evidence of a recurring use case, and every future manual rewrite is pure waste.
Promotion means four things. First, identify the inputs that changed across your three runs and convert them to named variable slots using the Variable-Slot Checklist. Second, move the structural decisions (role, output contract, rules) into the fixed layers. Third, add the self-critique rubric. Fourth, version-pin the template so you know when it was last changed and can trace output drift back to a specific edit.
The version-pin is not optional. Templates that drift silently produce outputs that appear to be prompt failures when they are actually template failures. A simple comment line at the top of the template text ("v1.2, updated 2026-06-10, added ban on 'robust'") is enough.
The abstraction ladder runs further than this guide covers: a version-pinned, testable, parameterized template is what most practitioners call a "prompt." Once you add a typed input/output schema, a test suite, and explicit invariants that must hold across all runs, you have something more durable. That promotion from informal template to formally specified unit is exactly what engineers do when they factor out a function from a script, and it is what tools in the prompt-OS space are built to automate.
If you would rather describe the template in plain English and have it parameterized for you, that is exactly what BrainBoot's Compiler does (disclosure: BrainBoot is ours). Either way, the variable-slot discipline above is the part that matters.
Frequently asked questions
What is a prompt template?
{{topic}} or {{audience}}) rather than hardcoded values. You write the structure once, slot in different values for each run, and get consistent, predictable outputs. The template captures your reasoning about role, output format, constraints, and self-review; the variable slots capture what changes run to run.What should I parameterize in a prompt versus hardcode?
What is the RAILS framework?
What is the self-critique loop and why does it matter?
When should you promote a prompt to a reusable template?
Bottom line
A bare instruction is a one-time tool. A prompt template is a durable asset. The gap between them is the Variable-Slot Checklist decision (what changes across runs goes in a slot; what defines the template's quality contract gets hardcoded), the four-layer anatomy (instruction, context, format, guardrail), and the RAILS framework that makes every layer explicit. None of it is complicated. The bottleneck is habit: most practitioners keep rewriting from scratch because they have not yet felt the compounding payoff of a single well-built template running reliably across hundreds of inputs. Run any prompt three times and you have earned the promotion. For the broader architecture of where templates fit inside multi-step workflows and scheduled agents, this spoke is part of our complete prompt engineering guide. For the hands-on course track covering these same techniques with graded exercises, the EduBracket team's best AI courses roundup covers what is currently enrollable.
- OpenAI prompt engineering guide (platform.openai.com).
- Anthropic prompt engineering overview (docs.anthropic.com).
- Wei, J., Wang, X., Schuurmans, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 2022.
- Wang, X., Wei, J., Schuurmans, D., et al. (2023). Self-consistency improves chain of thought reasoning in language models. ICLR 2023. arXiv:2203.11171.
- Anthropic model overview (docs.anthropic.com).
- OpenAI model documentation (platform.openai.com).