How to Do Structured LLM Generation in Flows#

Prerequisites

This guide assumes familiarity with Flows.

WayFlow enables to leverage LLMs to generate text and structured outputs. This guide will show you how to:

use the PromptExecutionStep to generate text using an LLM
use the PromptExecutionStep to generate structured outputs
use the AgentExecutionStep to generate structured outputs using an agent

Basic implementation#

In this how-to guide, you will learn how to do a structured LLM generation with Flows.

WayFlow supports several LLM API providers. Select an LLM from the options below:

from wayflowcore.models import OCIGenAIModel

if __name__ == "__main__":

    llm = OCIGenAIModel(
        model_id="provider.model-id",
        service_endpoint="https://url-to-service-endpoint.com",
        compartment_id="compartment-id",
        auth_type="API_KEY",
    )

from wayflowcore.models import VllmModel

llm = VllmModel(
    model_id="model-id",
    host_port="VLLM_HOST_PORT",
)

from wayflowcore.models import OllamaModel

llm = OllamaModel(
    model_id="model-id",
)

Assuming you want to summarize this article:

article = """Sea turtles are ancient reptiles that have been around for over 100 million years. They play crucial roles in marine ecosystems, such as maintaining healthy seagrass beds and coral reefs. Unfortunately, they are under threat due to poaching, habitat loss, and pollution. Conservation efforts worldwide aim to protect nesting sites and reduce bycatch in fishing gear."""

WayFlow offers the PromptExecutionStep for this type of queries. Use the code below to generate a 10-words summary:

from wayflowcore.steps import PromptExecutionStep, StartStep

start_step = StartStep(input_descriptors=[StringProperty("article")])
summarize_step = PromptExecutionStep(
    llm=llm,
    prompt_template="""Summarize this article in 10 words:\n {{article}}""",
    output_mapping={PromptExecutionStep.OUTPUT: "summary"},
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_step),
        ControlFlowEdge(source_step=summarize_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(start_step, "article", summarize_step, "article"),
    ],
)

Note

In the prompt, article is a Jinja2 syntax to specify a placeholder for a variable, which will appear as an input for the step. If you use {{var_name}}, the variable named var_name will be of type StringProperty. If you specify anything else Jinja2 compatible (for loops, filters, and so on), it will be of type AnyProperty.

Now execute the flow:

conversation = flow.start_conversation(inputs={"article": article})
status = conversation.execute()
print(status.output_values["summary"])
# Sea turtles face threats from poaching, habitat loss, and pollution globally.

As expected, your flow has generated the article summary!

Structured generation with Flows#

In many cases, generating raw text within a flow is not very useful, as it is difficult to leverage in later steps. Instead, you might want to generate attributes that follow a particular schema. The PromptExecutionStep class enables this through the output_descriptors parameter.

from wayflowcore.property import ListProperty, StringProperty
from wayflowcore.steps import PromptExecutionStep, StartStep

animal_output = StringProperty(
    name="animal_name",
    description="name of the animal",
    default_value="",
)
danger_level_output = StringProperty(
    name="danger_level",
    description='level of danger of the animal. Can be "HIGH", "MEDIUM" or "LOW"',
    default_value="",
)
threats_output = ListProperty(
    name="threats",
    description="list of threats for the animal",
    item_type=StringProperty("threat"),
    default_value=[],
)


start_step = StartStep(input_descriptors=[StringProperty("article")])
summarize_step = PromptExecutionStep(
    llm=llm,
    prompt_template="""Extract from the following article the name of the animal, its danger level and the threats it's subject to. The article:\n\n {{article}}""",
    output_descriptors=[animal_output, danger_level_output, threats_output],
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_step),
        ControlFlowEdge(source_step=summarize_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(start_step, "article", summarize_step, "article"),
    ],
)

conversation = flow.start_conversation(inputs={"article": article})
status = conversation.execute()
print(status.output_values)
# {'threats': ['poaching', 'habitat loss', 'pollution'], 'danger_level': 'HIGH', 'animal_name': 'Sea turtles'}

Complex JSON objects#

Sometimes, you might need to generate an object that follows a specific JSON Schema. You can do that by using an output descriptor of type ObjectProperty, or directly converting your JSON Schema into a descriptor:

from wayflowcore.property import Property, StringProperty
from wayflowcore.steps import PromptExecutionStep, StartStep

animal_json_schema = {
    "title": "animal_object",
    "description": "information about the animal",
    "type": "object",
    "properties": {
        "animal_name": {
            "type": "string",
            "description": "name of the animal",
            "default": "",
        },
        "danger_level": {
            "type": "string",
            "description": 'level of danger of the animal. Can be "HIGH", "MEDIUM" or "LOW"',
            "default": "",
        },
        "threats": {
            "type": "array",
            "description": "list of threats for the animal",
            "items": {"type": "string"},
            "default": [],
        },
    },
}
animal_descriptor = Property.from_json_schema(animal_json_schema)

start_step = StartStep(input_descriptors=[StringProperty("article")])
summarize_step = PromptExecutionStep(
    llm=llm,
    prompt_template="""Extract from the following article the name of the animal, its danger level and the threats it's subject to. The article:\n\n {{article}}""",
    output_descriptors=[animal_descriptor],
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_step),
        ControlFlowEdge(source_step=summarize_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(start_step, "article", summarize_step, "article"),
    ],
)

conversation = flow.start_conversation(inputs={"article": article})
status = conversation.execute()
print(status.output_values)
# {'animal_object': {'animal_name': 'Sea turtles', 'danger_level': 'MEDIUM', 'threats': ['Poaching', 'Habitat loss', 'Pollution']}}

Structured generation with Agents#

In certain scenarios, you might need to invoke additional tools within your flow. You can instruct the agent to generate specific outputs, and use them in the AgentExecutionStep class to perform structured generation.

from wayflowcore.agent import Agent, CallerInputMode
from wayflowcore.controlconnection import ControlFlowEdge
from wayflowcore.steps import AgentExecutionStep, StartStep

start_step = StartStep(input_descriptors=[])
agent = Agent(
    llm=llm,
    custom_instruction="""Extract from the article given by the user the name of the animal, its danger level and the threats it's subject to.""",
    initial_message=None,
)

summarize_agent_step = AgentExecutionStep(
    agent=agent,
    output_descriptors=[animal_output, danger_level_output, threats_output],
    caller_input_mode=CallerInputMode.NEVER,
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_agent_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_agent_step),
        ControlFlowEdge(source_step=summarize_agent_step, destination_step=None),
    ],
    data_flow_edges=[],
)

conversation = flow.start_conversation()
conversation.append_user_message("Here is the article: " + article)
status = conversation.execute()
print(status.output_values)
# {'animal_name': 'Sea turtles', 'danger_level': 'HIGH', 'threats': ['poaching', 'habitat loss', 'pollution']}

How to write secure prompts with Jinja templating#

Jinja2 is a fast and flexible templating engine for Python, enabling dynamic generation of text-based formats by combining templates with data.

However, enabling all Jinja templating capabilities poses some security challenges. For this reason, WayFlow relies on a stricter implementation of the Jinja’s SandboxedEnvironment for higher security. Every callable is considered unsafe, and every attribute and item access is prevented, except for:

The attributes index0, index, first, last, length of the jinja2.runtime.LoopContext;
The entries of a python dictionary (only native type is accepted);
The items of a python list (only native type is accepted).

You should never write a template that includes a function call, or access to any internal attribute or element of an arbitrary variable: that is considered unsafe, and it will raise a SecurityException.

Moreover, WayFlow performs additional checks on the inputs provided for rendering. In particular, only elements and sub-elements that are of basic python types (str, int, float, bool, list, dict, tuple, set, NoneType) are accepted. In any other case, a SecurityException is raised.

What you can write#

Here’s a set of common patters that are accepted by WayFlow’s restricted Jinja templating.

Templates that access variables of base python types:

my_var: str = "simple string"
template = "{{ my_var }}"
# Expected outcome: "simple string"

Templates that access elements of a list of base python types:

my_var: list[str] = ["simple string"]
template = "{{ my_var[0] }}"
# Expected outcome: "simple string"

Templates that access dictionary entries of base python types:

my_var: dict[str, str] = {"k1": "simple string"}
template = "{{ my_var['k1'] }}"
# Expected outcome: "simple string"

my_var: dict[str, str] = {"k1": "simple string"}
template = "{{ my_var.k1 }}"
# Expected outcome: "simple string"

Builtin functions of Jinja, like length or format:

my_var: list[str] = ["simple string"]
template = "{{ my_var | length }}"
# Expected outcome: "1"

Simple expressions:

template = "{{ 7*7 }}"
# Expected outcome: "49"

For loops, optionally accessing the LoopContext:

my_var: list[int] = [1, 2, 3]
template = "{% for e in my_var %}{{e}}{{ ', ' if not loop.last }}{% endfor %}"
# Expected outcome: "1, 2, 3"

If conditions:

my_var: int = 4
template = "{% if my_var % 2 == 0 %}even{% else %}odd{% endif %}"
# Expected outcome: "even"

Our general recommendation is to avoid complex logic in templates, and to pre-process the data you want to render instead. For example, in case of complex objects, in order to comply with restrictions above, you should conveniently transform them recursively into a dictionary of entries of basic python types (see list of accepted types above).

What you cannot write#

Here’s a set of common patters that are NOT accepted by WayFlow’s restricted Jinja templating.

Templates that access arbitrary objects:

my_var: MyComplexObject = MyComplexObject()
template = "{{ my_var }}"
# Expected outcome: SecurityException

Templates that access attributes of arbitrary objects:

my_var: MyComplexObject = MyComplexObject(attribute="my string")
template = "{{ my_var.attribute }}"
# Expected outcome: SecurityException

Templates that access internals of any type and object:

my_var: dict = {"k1": "my string"}
template = "{{ my_var.__init__ }}"
# Expected outcome: SecurityException

Templates that access non-existing keys of a dictionary:

my_var: dict = {"k1": "my string"}
template = "{{ my_var['non-existing-key'] }}"
# Expected outcome: SecurityException

Templates that access keys of a dictionary of type different from int or str:

my_var: dict = {("complex", "key"): "my string"}
template = "{{ my_var[('complex', 'key')] }}"
# Expected outcome: SecurityException

Templates that access callables:

my_var: Callable = lambda x: f"my value {x}"
template = "{{ my_var(2) }}"
# Expected outcome: SecurityException

my_var: list = [1, 2, 3]
template = "{{ len(my_var) }}"
# Expected outcome: SecurityException

my_var: MyComplexObject = MyComplexObject()
template = "{{ my_var.to_string() }}"
# Expected outcome: SecurityException

For more information, please check our Security considerations page.

Recap#

In this guide, you learned how to incorporate LLMs into flows using the PromptExecutionStep class to:

generate raw text
produce structured output
generate structured generation using the agent and AgentExecutionStep

Below is the complete code from this guide.

article = """Sea turtles are ancient reptiles that have been around for over 100 million years. They play crucial roles in marine ecosystems, such as maintaining healthy seagrass beds and coral reefs. Unfortunately, they are under threat due to poaching, habitat loss, and pollution. Conservation efforts worldwide aim to protect nesting sites and reduce bycatch in fishing gear."""

llm = LlmModelFactory.from_config(model_config)

from wayflowcore.steps import PromptExecutionStep, StartStep

start_step = StartStep(input_descriptors=[StringProperty("article")])
summarize_step = PromptExecutionStep(
    llm=llm,
    prompt_template="""Summarize this article in 10 words:\n {{article}}""",
    output_mapping={PromptExecutionStep.OUTPUT: "summary"},
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_step),
        ControlFlowEdge(source_step=summarize_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(start_step, "article", summarize_step, "article"),
    ],
)

conversation = flow.start_conversation(inputs={"article": article})
status = conversation.execute()
print(status.output_values["summary"])
# Sea turtles face threats from poaching, habitat loss, and pollution globally.

from wayflowcore.property import ListProperty, StringProperty
from wayflowcore.steps import PromptExecutionStep, StartStep

animal_output = StringProperty(
    name="animal_name",
    description="name of the animal",
    default_value="",
)
danger_level_output = StringProperty(
    name="danger_level",
    description='level of danger of the animal. Can be "HIGH", "MEDIUM" or "LOW"',
    default_value="",
)
threats_output = ListProperty(
    name="threats",
    description="list of threats for the animal",
    item_type=StringProperty("threat"),
    default_value=[],
)


start_step = StartStep(input_descriptors=[StringProperty("article")])
summarize_step = PromptExecutionStep(
    llm=llm,
    prompt_template="""Extract from the following article the name of the animal, its danger level and the threats it's subject to. The article:\n\n {{article}}""",
    output_descriptors=[animal_output, danger_level_output, threats_output],
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_step),
        ControlFlowEdge(source_step=summarize_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(start_step, "article", summarize_step, "article"),
    ],
)

conversation = flow.start_conversation(inputs={"article": article})
status = conversation.execute()
print(status.output_values)
# {'threats': ['poaching', 'habitat loss', 'pollution'], 'danger_level': 'HIGH', 'animal_name': 'Sea turtles'}

from wayflowcore.agent import Agent, CallerInputMode
from wayflowcore.controlconnection import ControlFlowEdge
from wayflowcore.steps import AgentExecutionStep, StartStep

start_step = StartStep(input_descriptors=[])
agent = Agent(
    llm=llm,
    custom_instruction="""Extract from the article given by the user the name of the animal, its danger level and the threats it's subject to.""",
    initial_message=None,
)

summarize_agent_step = AgentExecutionStep(
    agent=agent,
    output_descriptors=[animal_output, danger_level_output, threats_output],
    caller_input_mode=CallerInputMode.NEVER,
)
summarize_step_name = "summarize_step"
flow = Flow(
    begin_step_name="start_step",
    steps={
        "start_step": start_step,
        summarize_step_name: summarize_agent_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=summarize_agent_step),
        ControlFlowEdge(source_step=summarize_agent_step, destination_step=None),
    ],
    data_flow_edges=[],
)

conversation = flow.start_conversation()
conversation.append_user_message("Here is the article: " + article)
status = conversation.execute()
print(status.output_values)
# {'animal_name': 'Sea turtles', 'danger_level': 'HIGH', 'threats': ['poaching', 'habitat loss', 'pollution']}

Next steps#

Having learned how to perform structured generation in WayFlow, you may now proceed to:

Config Generation to change LLM generation parameters.
Catching Exceptions to ensure robustness of the generated outputs.