Security Considerations#

Scope: This document covers security considerations related to Agent Spec, a portable and platform-agnostic configuration language for agents, tools, and workflows. It applies to anyone authoring or consuming Agent Spec specifications, regardless of deployment type or execution engine used.

Why it matters: Agent Spec defines agents, tools, and workflows that are executed in environments integrating components like LLMs, Python, APIs, and external tools. While Agent Spec itself is purely declarative, its specifications influence runtime behavior. Risks can arise from configurations that include sensitive information, lack proper isolation, or rely on untrusted components. This document helps reduce misconfiguration risks and supports the secure execution of the resulting agent.

Considerations regarding generated code and templates#

Some adapters can translate Agent Spec representations into runtime-specific source code. Generated source should only be produced from specifications that passed validation, reviewed before deployment when the specification is untrusted, and executed with the same isolation expected for application code.

Treat values inserted into prompts, URLs, headers, request bodies, or other executable/runtime-sensitive fields as untrusted data.

Considerations regarding prompt templates#

Prompt templates can combine trusted instructions with runtime values. When a placeholder in a system prompt or other instruction-bearing prompt is filled from end-user input, tool output, retrieved documents, MCP responses, or other untrusted sources, the model may interpret the substituted text as instructions rather than data.

Use placeholders in system prompts only for trusted configuration values. Avoid inserting user-provided, tool-derived, RAG, or MCP output directly into system prompts. When untrusted content must be included in a prompt, keep it separate from system instructions where possible, delimit it clearly as data, validate and normalize it, and apply length limits or filtering in the runtime. These controls reduce prompt-injection risk, but they do not eliminate it.

Considerations regarding tools#

Agent Spec lets you define tools that interact with the agent’s environment. The representation is only aware of the tool’s description, e.g. tool ID, attributes, inputs, outputs, and tool description (i.e. explaining what the tools does in natural language) and does not store any executable code.

Key Principles for Tool Security:#

Mandatory Input Validation: Always validate the tool inputs in your tool’s code. For all tools, use inputs to define/enforce schemas (types and descriptions) as a primary defense. Note that inputs do not support constraints like ranges or max lengths, so you must implement additional validation in your tool’s code. You should also validate the semantic correctness of inputs, especially when they may originate from untrusted sources like LLM completions or end-user input. For remote tools, which constructs HTTP requests, validation is critical for all parameters that can be templated (e.g., url, method, json_body, data, params, headers, cookies). While inputs can define the schema for arguments passed to the remote tool, the core security lies in controlling how these arguments are used to form the request and in features like url_allow_list.
Output Scrutiny: For all tools, you should define the expected type of a tool’s output using outputs in the representation. Currently, the representation does not support output constraints like maximum length or detailed validation logic. The output of well-contained tools which you fully control can often be trusted, but tool outputs should be treated with caution when they:
- are not under your direct control,
- use an LLM to generate output,
- are based on end-user input, or
- pull data from untrusted or dynamic sources (e.g. a remote API or a retrieval tool searching data in a database, where the contents may change over time).
In these cases, an attacker could inject harmful content via tools and downstream components should treat tool outputs as untrusted input and perform appropriate validation and sanitization before use.
Tool Reference Integrity: In Agent Spec, tools are referenced by their unique IDs, which are resolved by the execution environment. When the representation is exported and later imported in a different environment (or in the same environment under different conditions), the same tool ID may resolve to a different implementation than originally intended. This can lead to unexpected or insecure behavior. Always verify that referenced tool IDs map to the correct and trusted tool definitions in the target environment, especially when moving the representation between systems or teams.

Considerations regarding network communication#

The RemoteTool, MCPTool and the ApiNode define HTTP-based interactions with external services.

Important

Never embed API keys, passwords, or other secrets directly in Agent Spec.

Always use secure injection mechanisms and follow the principle of least privilege for credential access.

Following security considerations is important when working with remote/mcp tools or API nodes:

Templated Request Arguments: Since most request fields support templating, they can be influenced by dynamic inputs, such as LLM completions or user-provided values. As a result, these fields must be handled with extra caution in the specification to avoid misuse at runtime. Maliciously crafted inputs could lead to information leakage (e.g., exposing sensitive data in URLs or headers) or enable attacks like SSRF (Server-Side Request Forgery) or automated DDoS. Developers must validate all templated inputs, including URLs, headers, and request bodies before executing the request. Prefer a fixed developer-controlled base URL and template only path, query, or body values whenever possible.
URL Allow Lists (``url_allow_list``): RemoteTool and ApiNode support an optional url_allow_list field that can be used by runtimes or adapters to restrict outbound requests to a predefined set of approved destinations. This field is especially recommended when placeholders can affect the destination part of the URL (scheme, host, or port), but it is not compulsory: adapter/runtime documentation should explain how this rule is applied. The intended matching behavior is exact matching on scheme and authority (host and port), together with prefix matching on the path; query parameters, URL params, and fragments are not used for matching. In practice these patterns are plain URL prefixes, for example https://api.example.com or https://api.example.com/orders/.
Secure Connections: Use HTTPS for all remote calls.
HTTP Method: When possible, use fixed and explicit HTTP methods for the http_method parameter in the RemoteTool or ApiNode (e.g., "GET", "POST") rather than dynamic templates.
Timeouts: Remote/MCP tools or API nodes may block indefinitely if the external service is unresponsive. Timeout behavior should be enforced by the execution environment or tool implementation to avoid resource exhaustion. RemoteTool and LLM configurations can use RetryPolicy.request_timeout to express a per-request timeout; runtimes and adapters should pass it through to provider SDKs or HTTP clients when supported.
Retries: RemoteTool can use RetryPolicy.max_attempts to retry transient HTTP client errors and recoverable HTTP responses. Keep retry counts bounded and pair them with request timeouts to avoid extending failures indefinitely.

While Agent Spec does not manage infrastructure, authors should still be aware of how their configurations interact with secure deployment environments. We recommend to make sure that the deployment enforces strict network communication rules to make sure that only the desired and required network services are allowed (communication with external public IPs). Consider the following security considerations:

Model Integrity: For third-party models; Download via HTTPS, pin SHA-256 digests and verify integrity before loading into the execution engine.
Egress Restrictions: Remote calls defined in representation (e.g., via RemoteTool or ApiNode) must align with outbound allow-lists and network policies.
Subnet Isolation: Isolate Agent Spec components to limit blast radius:
- LLM hosts (OCI Gen AI, vLLM) in dedicated subnets.
- Remote tools and ApiNodes targets in isolated subnets.
- Control-flow logic (e.g., Agents, Flows, BranchingSteps) separate from data processing (e.g., database mutations, data analytics)
TLS: Use mTLS for service-to-service communications such as connecting to MCP servers, Monitoring/Telemetry services, File Storage Services, Databases, and so on.
Strict Egress Rules: Default-deny outbound traffic. Allow only Allow-Listed domains/IPs for Agent Spec components (such as when using remote tools or API Nodes).
Centralized Allow-Lists: Use a central allow-list to restrict access to LLM endpoints, logging/telemetry sinks, and external APIs called by components like remote tools or API Nodes. When using RemoteTool or ApiNode, align component-level url_allow_list values with these infrastructure-level policies. Set up alerts for policy violations.

Considerations regarding API keys and secrets management#

Important

Never embed API keys, passwords, or other secrets directly in Agent Spec.

Agent Spec is a declarative language for agents, tools, and flows. It may be stored, versioned, or inspected; it is not a secure medium for sensitive data. Hardcoding secrets in the representation increases the risk of accidental exposure and must be avoided.

Instead of embedding secrets in the representation, ensure that secrets are injected securely at runtime via:

Integration with a secrets manager (e.g., OCI Vault)
Use instance principal when working on OCI Computes
Use Kubernetes/Docker secrets if the deployments supports it

Furthermore, follow these general best practices around credential management

Rotate credentials regularly.
Use organization-specific API keys with usage limits.
Use dedicated credentials per environment or agent, rather than sharing keys across flows or services.
Follow the principle of least privilege for credential access.
Avoid exposing secrets to LLMs, logs, or error messages.
Monitor credential usage for anomalies (e.g., unexpected volume or access patterns).
Restrict network access and egress for components that hold credentials, especially tools calling external services.

Considerations regarding OAuth tokens and session-bound credentials#

OAuth-based integrations introduce bearer credentials (access tokens and, optionally, refresh tokens). Anyone who obtains these tokens can typically act as the user (within the granted scopes) until expiry or revocation. The following guidance applies to runtimes and to developers deploying agents/tools that use OAuthConfig.

1. Treat tokens and resumption handles as sensitive secrets#

Access tokens, refresh tokens, and authorization codes MUST be treated as secrets:
- Do not log them.
- Do not include them in agent prompts, tool inputs/outputs, traces, telemetry, or error messages.
- Do not store them in exported agent/tool configuration artifacts.
Conversation IDs / “previous response IDs” / resumption handles can implicitly become bearer credentials if your runtime caches tokens by conversation. If a conversation can be resumed by anyone who has the identifier (e.g., link sharing, URL leakage, browser history, referrers), then those tokens are effectively shared too.

Implication: For anything beyond local development, do not make “possession of a conversation ID” sufficient to regain an authenticated session.

2. Don’t tie token access to publicly accessible conversation context#

If your runtime caches OAuth tokens in memory (or any persistence layer) keyed by conversation/session identifiers, then you MUST ensure the ability to resume that conversation is protected by a real authentication/authorization layer.

Recommended controls: - Authenticate the user before allowing conversation continuation. - Authorize access: ensure only the same user (or a deliberately permitted group) can resume the session that holds tokens. - Use strong, unguessable identifiers (cryptographically random), with expiration/TTL, and consider one-time-use or rotation on resume. - Avoid embedding resumption identifiers in places that leak (URLs shared externally, referer headers, client-side logs). Prefer secure headers or server-managed sessions.

3. Minimize token scope, lifetime, and blast radius#

Request the minimum scopes required (least privilege).
Prefer short-lived access tokens. Only request offline access / refresh tokens when needed.
Use resource indicators / audience restriction when supported (e.g., RFC 8707) to reduce token reuse against unintended APIs.
Consider isolating credentials by environment (dev/test/prod) and by agent/service to limit impact of compromise.

4. Secure OAuth flow mechanics (authorization code + PKCE)#

Runtimes implementing OAuthConfig SHOULD follow OAuth 2.x / OIDC best practices:

Use PKCE (S256 preferred) for authorization code flows; refuse flows that cannot meet policy when pkce.required=true.
Validate ``state`` to prevent CSRF and callback injection.
Strict redirect URI validation: only allow pre-registered redirect URIs; prefer https:// in production; avoid wildcard redirects.
If discovery is used (issuer), validate issuer/metadata carefully and prefer HTTPS with proper certificate validation.

Considerations regarding Resource-exhaustion vectors#

Agent Spec allows for flexible flow and agent logic, however, certain configurations could result in infinite loops, repeated LLM calls, or unbounded parallelism which can lead to resource exhaustion.

The following example shows how to avoid resource exhaustion from repeated tool executions in an infinite loop. The idea is to add a branching node that exits the loop once a counter hits the specified number of iterations.

Python Flow example

from typing import cast

from pyagentspec.flows.nodes import StartNode, EndNode, ToolNode, BranchingNode
from pyagentspec.tools import ClientTool, ServerTool
from pyagentspec.flows.flow import Flow
from pyagentspec.flows.edges import ControlFlowEdge, DataFlowEdge
from pyagentspec.property import IntegerProperty, StringProperty
from pyagentspec.serialization import AgentSpecDeserializer, AgentSpecSerializer

# We run the tool_to_execute tool 5 times. After that, we exit

# Start and end nodes of the flow
start_node = StartNode(name="start")
end_node = EndNode(name="end")

# We are using a helper tool to count the number tool executions
# It will increment the value of counter by 1 in each iteration
tool_counter = ServerTool(
    name="counter",
    inputs=[IntegerProperty(title="counter", default=0)],
    outputs=[IntegerProperty(title="counter")],
)
tool_counter_node = ToolNode(name="counter", tool=tool_counter)

# The tool node for the tool that we're interested in executing in a loop
# (for the purpose of this example, the tool does nothing)
tool_to_execute = ServerTool(
    name="do nothing",
    inputs=[],
    outputs=[],
)
tool_to_execute_node = ToolNode(name="counter", tool=tool_to_execute)

# The branching node will be able to exit the loop after limit_reached iterations
# It will compare the counter value - counting the number iterations in the loop - to
# the constant limit_reached value
branching_node = BranchingNode(
    name="branching",
    mapping={"5": "limit_reached"},
    inputs=[StringProperty(title="counter")],
)

# The control flow graph looks like this
# start_node -> tool_node -> tool_counter_node -> branching_node -> end_node
#                  ^                                  |
#                  L _________________________________|
control_flow_edges = [
    ControlFlowEdge(
        name="cf1",
        from_node=start_node,
        to_node=tool_to_execute_node,
    ),
    ControlFlowEdge(
        name="cf2",
        from_node=tool_to_execute_node,
        to_node=tool_counter_node,
    ),
    ControlFlowEdge(
        name="cf3",
        from_node=tool_counter_node,
        to_node=branching_node,
    ),
    ControlFlowEdge(
        name="cf4",
        from_node=branching_node,
        from_branch="default",
        to_node=tool_to_execute_node,
    ),
    ControlFlowEdge(
        name="cf5",
        from_node=branching_node,
        from_branch="limit_reached",
        to_node=end_node,
    ),
]

# df1 creates a self-loop on the counter node s.t. the value can be incremented in each iteration
# df2 feeds the counter into the branching step s.t. it can decide whether to continue the
# loop or move to the end_node
data_flow_edges = [
    DataFlowEdge(
        name="df1",
        source_node=tool_counter_node,
        source_output="counter",
        destination_node=tool_counter_node,
        destination_input="counter",
    ),
    DataFlowEdge(
        name="df2",
        source_node=tool_counter_node,
        source_output="counter",
        destination_node=branching_node,
        destination_input="counter",
    ),
]

flow_with_loop = Flow(
    name="flow_with_loop",
    start_node=start_node,
    nodes=[start_node, end_node, tool_counter_node, tool_to_execute_node, branching_node],
    data_flow_connections=data_flow_edges,
    control_flow_connections=control_flow_edges,
)

serializer = AgentSpecSerializer()
serialized_flow = serializer.to_json(flow_with_loop)

Gaps in coverage (what is not provided)#

The following controls are currently not configurable in the Agent Spec. Some of these concerns, particularly those related to infrastructure and deployment, are the responsibility of the execution engine. However, it’s important to be aware of these gaps when designing or deploying agents.

Memory limits – no built-in interrupt for resident-set or GPU VRAM.
CPU / thread quotas – Python loops can still hog a core until the Step returns.
Soft timeouts and token limits – the spec does not support soft timeouts or soft token limits. These controls - available in some execution engines - can stop long-running agents or limit LLM token usage, but only between steps.
Hard timeouts inside a Tool – a Tool that calls a REST API may block indefinitely.
Concurrent-request ceilings – there is no built-in limit on how many tool calls or sub-flows can run in parallel. Without external throttling, excessive fan-out may exhaust system resources.
LLM generation cancellation – without a provider-supported request timeout or cancellation mechanism, once a prompt is sent, the agent will block until the generation is done.
Conversation isolation – the Agent Spec does not support isolation between sub-flows or sub-agents; conversations are shared and may be forwarded externally if an external endpoint is involved.

Considerations regarding assistant and flow serialization#

Deserializing data from untrusted sources is inherently risky. Always treat Agent Spec configuration files from external or unverified origins as potentially malicious.

Verify Data Integrity and Authenticity: Before attempting to import an Agent Spec configuration file to instantiate an agent or flow, ensure:

The data comes from a trusted source and has not been tampered with. Implement mechanisms such as digital signatures to verify the origin and integrity of the serialized data.
Its contents align with your expectations; do not assume the representation defines what you think it does. Review the structure, tool references, and key behaviors to ensure the deserialized agent behaves as intended.

Important

Use secure built-in serialization and deserialization utilities whenever possible. For example, to deserialize YAML files, use yaml.safe_load() from the PyYAML package (do not use yaml.load()).

When working with Agent Spec SDKs like PyAgentSpec, always use the provided serialization and deserialization utilities to avoid unsafe behavior.

from pyagentspec.serialization import AgentSpecDeserializer

# Assume verify_signature_and_integrity is a custom function you've implemented
# to check the signature and integrity of the raw_yaml_string against a trusted key.
# This step is conceptual and depends on your specific trust model.

raw_yaml_string = get_potentially_untrusted_yaml_data()
is_trusted_source = verify_signature_and_integrity(raw_yaml_string)

if is_trusted_source:
  # from_yaml will use safe_load under the hood
  assistant_or_flow = AgentSpecDeserializer.from_yaml(raw_yaml_string)

When deploying serialized representation to an execution engine, use the engine’s secure loading mechanism for the import.

Considerations regarding runtime-sensitive components#

Some Agent Spec components can influence actions performed by adapters or runtimes when a configuration is loaded or run. For example, MCP stdio transports can start a local MCP server subprocess on the machine that loads and runs the resulting agent.

Only enable runtime-sensitive components from trusted configurations. PyAgentSpec loaders block StdioTransport and its subclasses by default. If a trusted configuration intentionally uses stdio transports, pass blocked_components=[] to the loader, or provide a custom blocked_components value that does not include the stdio transport component class. For stricter environments, use allowed_components to explicitly list the component types that may be loaded.

Component policy entries can be provided as Component classes or component type names. Type names that resolve to known Component classes use the same hierarchy matching as class entries; unresolved type names match only the exact serialized component type.

Considerations regarding exception handling in Flows#

The CatchExceptionNode can be used to catch exceptions that may occur in the execution of a subflow. This node exposes a caught_exception_info output that may contain sensitive information if populated naïvely from raw exceptions. Implementations and runtimes should avoid leaking security-sensitive data such as:

Authentication/authorization errors or secrets (tokens, API keys, credentials)
Internal hostnames, IP addresses, or infrastructure details
File paths or environment variables with sensitive values
Full stack traces or code internals
TLS/SSL or configuration details that could aid exploitation

Note

On Exception boundary / what is caught:

Executors must catch only recoverable exceptions and must not catch fatal/runtime-termination or cancellation/control-flow throwables (e.g., KeyboardInterrupt, SystemExit in Python).

Best practices when using the CatchExceptionNode:

Keep caught_exception_info minimal and user-safe; prefer a brief, generic message rather than raw exception/traceback data.
Sanitize messages to remove secrets and internal identifiers.
Consider not catching specific sensitive exception classes at all (e.g., authz, severe misconfiguration, TLS/SSL failures) so they propagate to secure handlers.
Remember that caught_exception_info is optional and defaults to null when no exception is raised.

Considerations regarding Parallelization#

Agent Spec supports the configuration of flows that contain parallel nodes, enabling concurrent execution of sub-flows. While parallelization offers potential performance or latency benefits, it introduces its own set of security and operational risks that must be carefully managed.

Execution Semantics and Safety Guidelines#

Agent Spec outlines the expected behavior for parallel execution, but it does not specify implementation details, since those are determined by each runtime.

For runtime developers: Always provide clear documentation on how parallelism is implemented and describe secure usage patterns.
For users: Always review the runtime’s documentation and follow its recommendations to use parallelism safely.

A list of typical potential issues when dealing with parallelism follows.

Non-Deterministic, Nonatomic Execution Order: In Agent Spec, parallel nodes are explicitly declared with the understanding that the order and the atomicity of the execution of their sub-flows is not guaranteed. The system may schedule, start, or complete individual flows in any order, and the timing may vary across runs. Do not assume any deterministic ordering or synchronization between parallel tasks.
Concurrent State Mutation Hazards: Parallel execution can lead to race conditions and state inconsistencies if multiple branches attempt to mutate shared resources or stateful entities (such as conversation history, external databases, or files) simultaneously.
- Avoid write operations on shared or stateful objects within parallel branches where synchronization is not explicitly managed by the execution environment.
- For operations requiring state changes (e.g., sending messages, updating databases, modifying files), consider serializing those steps after parallel execution has completed, or use database transactions/locking mechanisms where supported.
Interrupt Handling Limitations: Execution interrupts, such as tool confirmation steps, requesting additional input, or other interactive requirements, are not guaranteed to be coordinated or honored in a well-defined order within parallel branches.
- Avoid placing user input prompts, message outputs, or tool confirmation steps inside parallel blocks, as this can result in unpredictable, inconsistent, or undesirable user experiences.
- Design flows such that interactive or interrupt-driven logic occurs before or after parallel execution blocks.
Error Propagation and Fault Isolation: Failure in one parallel branch might not automatically propagate to other branches. Carefully plan how errors are handled and consider using explicit error-handling logic to aggregate, report, or respond to errors from each concurrent path.
- Ensure that sensitive operations are performed atomically or idempotently to prevent security vulnerabilities due to partial updates or inconsistent state.
Resource-Exhaustion Risks: Creating excessive parallelism (e.g., spawning large numbers of concurrent tool calls or sub-flows) may result in resource exhaustion—such as CPU, memory, API rate limits, or network bandwidth starvation.
- Limit the number of parallel branches to a reasonable level for your deployment environment.
- Monitor usage and apply throttling or limits as appropriate at the infrastructure or execution engine layer.

Summary Recommendations#

Design parallel blocks to perform independent, stateless tasks wherever possible.
Avoid interactive steps (like user input requests, client tools, confirmations) and uncontrolled state modifications in parallel branches.
Always validate that downstream infrastructure can handle the degree of parallelism configured.
Perform security and correctness reviews of all flows containing parallel nodes to identify unintended side effects or race conditions.

Other Security Concerns#

For any other security concerns, please submit a GitHub issue.