Safe Expression Evaluation in AI Workflow Engines: Eliminating Code Injection

Safe expression evaluation in AI workflow engines using custom recursive-descent parsers instead of eval(), eliminating code injection vulnerabilities in workflow condition logic.

Limitation: Syndicate Claw is self-hosted and currently targeted at single-domain environments.

Workflow engines need to evaluate conditions. Decision nodes route workflow execution based on runtime values. Approval gates evaluate context to determine routing. Policy rules evaluate expressions to determine permissions. Every evaluation point is a potential attack surface if the expression evaluator is not carefully designed.

The most common mistake is using Python's eval() or similar constructs for expression evaluation. eval() executes arbitrary code. If an attacker can influence the expression string, they can execute arbitrary code on the system. For workflow engines processing untrusted input, this is an unacceptable risk.

Syndicate Claw implements safe expression evaluation through a custom recursive-descent parser that evaluates a restricted grammar. The grammar is expressive enough for workflow conditions but limited enough to prevent code execution.

The eval() Problem

Python's eval() function evaluates a string as Python code. Given sufficient privileges, eval() can execute arbitrary operations: file access, network requests, system commands, data exfiltration.

The attack vector in workflow engines is expression injection. Workflow definitions include conditions—typically as strings that the workflow engine evaluates to determine routing. If these condition strings are derived from untrusted sources, an attacker might inject malicious code.

Consider a workflow with a condition: "transaction.amount > 1000". This is a straightforward comparison. Now imagine the condition is constructed from user input: f"transaction.amount > {user_amount_threshold}". If user_amount_threshold contains malicious code instead of a number, the expression becomes a code injection vector.

eval() does not provide protection against this. Any string that eval() can parse can contain arbitrary code.

The Safe Expression Grammar

Syndicate Claw's expression evaluator, implemented as _ConditionParser, supports a restricted grammar:

Literals: strings, numbers, booleans, null

Identifiers: state.field references that access workflow state

Comparison operators: ==, !=, <, >, <=, >=

Boolean operators: and, or, not

Grouping: parentheses for precedence control

This grammar is expressive for workflow conditions. You can express "the transaction amount exceeds the threshold" or "the user has the required role and the resource is not flagged" or "the model output confidence is below the minimum acceptable value."

What you cannot express: function calls, attribute access beyond the defined state structure, imports, comprehensions, assignments, or any other Python construct outside the defined grammar.

The grammar is implemented as a recursive-descent parser. The parser tokenizes the input string, builds an abstract syntax tree (AST) from the tokens according to the grammar rules, and evaluates the AST. The evaluation is a straightforward tree walk with no code generation, no dynamic compilation, and no invocation of arbitrary code.

State Field References

Workflow conditions often need to reference runtime state: the current transaction amount, the authenticated user's role, the model output value. The grammar supports state.field references for this purpose.

A state field reference accesses a named field from the workflow state. The parser validates that the field name exists and is accessible before evaluation. Arbitrary attribute access is not permitted—only fields defined in the workflow state schema can be referenced.

This prevents attacks that attempt to access internal state, system properties, or environment variables. The allowed fields are explicitly defined, not dynamically discovered.

Parser Implementation

The _ConditionParser is implemented as a recursive-descent parser with the following components:

Tokenizer: breaks the input string into tokens (literals, operators, identifiers, punctuation). Malformed input is rejected at this stage.

AST Builder: constructs an abstract syntax tree from the token stream according to the grammar. Grammar violations are rejected here.

Evaluator: walks the AST and produces a boolean result. The evaluator has no access to external state beyond the workflow state and the literal values in the AST.

Each component is isolated. The tokenizer does not execute. The AST builder does not evaluate. Only the evaluator executes, and it operates on a validated, restricted structure.

What Attack Classes Are Eliminated

Safe expression evaluation eliminates several attack classes:

Code execution. Without eval(), there is no mechanism to execute arbitrary Python code. The expression grammar does not include function calls, imports, or any construct that could execute system operations.

OS command injection. The grammar does not support subprocess execution, shell commands, or file system access. Those capabilities are unavailable through the expression language.

Data exfiltration. The grammar does not support network requests, file operations, or any mechanism for exporting data. Even if an attacker could inject malicious content, there is no mechanism to transmit data.

Attribute traversal. The grammar does not support arbitrary attribute access. Access is limited to explicitly defined state fields. Internal objects, system properties, and environment variables are inaccessible.

Denial of service. The grammar is designed to prevent pathological evaluation cases. Deeply nested expressions are limited by the parser's recursion depth. Memory consumption is bounded by the input size.

Competitive Differentiation

Most workflow engines use eval() or equivalent constructs for condition evaluation. The security risk is often accepted because the attack surface seems theoretical. In practice, any workflow engine processing untrusted input faces this risk.

Safe expression evaluation is a meaningful differentiator for enterprise security teams. When evaluating AI agent platforms, security architects ask: "How are workflow conditions evaluated?" The answer "we use a custom parser with a restricted grammar" is substantially more reassuring than "we use eval()."

For organizations subject to security audits, penetration testing, or compliance reviews, safe expression evaluation provides a control that can be demonstrated, not a risk that must be accepted.

Integration with Policy Engine

Safe expression evaluation is not limited to workflow decision nodes. The policy engine uses the same _ConditionParser for policy rule conditions. Policy rules specify conditions under which an action is permitted or denied.

Using the same evaluator for both workflow conditions and policy conditions provides consistency. The security properties that apply to workflow routing also apply to access control decisions.

Policy rule conditions can reference the actor, the resource, the action, and the environment. These references are validated against the policy context schema. The grammar remains restricted; the available context is defined.

Defense in Depth

Safe expression evaluation is one layer in Syndicate Claw's defense-in-depth architecture. Even with a safe parser, other controls protect the system:

Policy evaluation gates tool execution, preventing unauthorized actions even if a condition evaluation could be manipulated.

Sandbox enforcement limits what tools can do, even if they are invoked.

State redaction prevents sensitive data from appearing in outputs or logs.

Audit logging captures all significant events, providing evidence if something goes wrong.

Safe expression evaluation eliminates an entire attack class. Defense in depth ensures that even if other controls fail, the blast radius is limited.

Safe Expression Evaluation in AI Workflow Engines: Eliminating Code Injection

The eval() Problem

The Safe Expression Grammar

State Field References

Parser Implementation

What Attack Classes Are Eliminated

Competitive Differentiation

Integration with Policy Engine

Defense in Depth

Frequently asked questions

Why is eval() dangerous in workflow condition evaluation?

How does safe expression evaluation work?

What grammar does the expression parser support?

What attack classes does safe expression evaluation eliminate?

Can safe expression evaluation prevent all injection attacks?

Continue reading

The Fail-Closed Policy Engine: Why AI Agents Should Deny by Default

How to Build Auditable AI Workflows in Python

The Architecture of Replayable AI Agent Workflows