How to Specify the Generation Configuration when Using LLMs#

python-icon Download Python Script

Python script/notebook for this guide.

Generation Configuration how-to script

Generation parameters, such as temperature, top-p, the maximum number of output tokens, and per-token log-probabilities, are important for achieving the desired performance with Large Language Models (LLMs). In WayFlow, these parameters can be configured with the LlmGenerationConfig class.

This guide will show you how to:

  • Configure the generation parameters for an agent.

  • Configure the generation parameters for a flow.

  • Request token log probabilities.

  • Apply the generation configuration from a dictionary.

  • Save a custom generation configuration.

Note

For a deeper understanding of the impact of each generation parameter, refer to the resources at the bottom of this page.

Basic implementation#

Configure the generation parameters for an agent#

Customizing the generation configuration for an agent requires the use of the following wayflowcore components.

from wayflowcore.agent import Agent
from wayflowcore.models.llmgenerationconfig import LlmGenerationConfig

The generation configuration can be specified when initializing the LLM using the LlmGenerationConfig class. This ensures that all the outputs generated by the agent will have the same generation configuration.

The generation configuration dictionary can have the following arguments:

  • max_new_tokens: controls the maximum numbers of tokens to generate, ignoring the number of tokens in the prompt;

  • temperature: controls the randomness of the output;

  • top_p: controls the randomness of the output;

  • stop: defines a list of stop words to indicate the LLM to stop generating;

  • frequency_penalty: controls the frequency of tokens generated.

  • top_logprobs: requests token-level log probabilities, including alternate candidates when the provider supports them.

Additionally, the LlmGenerationConfig offers the possibility to set a dictionary of arbitrary parameters, called extra_args, that will be sent as part of the llm generation call. This allows specifying provider-specific parameters that might not be common to all.

Note

The extra parameters should never include sensitive information.

generation_config = LlmGenerationConfig(
    max_tokens=512,
    temperature=1.0,
    top_p=1.0,
    stop=["exit", "end"],
    frequency_penalty=0,
    extra_args={"seed": 1},
)

WayFlow supports several LLM API providers. You can pass the generation_config for each of them. Select an LLM from the options below:

from wayflowcore.models import OCIGenAIModel, OCIClientConfigWithApiKey

llm = OCIGenAIModel(
    model_id="provider.model-id",
    compartment_id="compartment-id",
    client_config=OCIClientConfigWithApiKey(
        service_endpoint="https://url-to-service-endpoint.com",
    ),
)

Important

API keys should not be stored anywhere in the code. Use environment variables and/or tools such as python-dotenv

Now, you can build an agent using the LLM as follows:

agent = Agent(llm=llm)
conversation = agent.start_conversation()
conversation.append_user_message("What is the capital of Switzerland?")
conversation.execute()
print(conversation.get_last_message())

Configure the generation parameters for a flow#

Customizing the generation configuration for a flow requires the use of the following wayflowcore components.

from wayflowcore.flow import Flow
from wayflowcore.models.llmgenerationconfig import LlmGenerationConfig
from wayflowcore.property import StringProperty
from wayflowcore.steps import PromptExecutionStep, StartStep

Refer to the previous section to learn how to configure the generation parameters when initializing an LLM using the LlmGenerationConfig class.

You can then create a one-step flow using the PromptExecutionStep step.

start_step = StartStep(name="start_step", input_descriptors=[StringProperty("user_question")])
prompt_step = PromptExecutionStep(
    name="PromptExecution",
    prompt_template="{{user_question}}",
    llm=llm,
    generation_config=LlmGenerationConfig(temperature=0.8),
)
flow = Flow(
    begin_step=start_step,
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=prompt_step),
        ControlFlowEdge(source_step=prompt_step, destination_step=None),
    ],
    data_flow_edges=[DataFlowEdge(start_step, "user_question", prompt_step, "user_question")],
)
conversation = flow.start_conversation(
    inputs={"user_question": "What is the capital of Switzerland?"}
)
conversation.execute()

Important

The generation_config parameter passed to the PromptExecutionStep overrides the LLM’s original generation configuration.

Advanced usage#

The LlmGenerationConfig class is a serializable object. It can be instantiated from a dictionary or saved to one, as you will see below.

Request token log probabilities#

Use top_logprobs when you want the model to return token-level probabilities for generated text. WayFlow stores those values on TextContent.logprobs for direct LLM calls, and the PromptExecutionStep also exposes them as an additional logprobs output.

Note

top_logprobs is only available for raw text generation. It is not supported with structured generation in PromptExecutionStep, and support depends on the selected provider and model.

For direct LlmModel calls, configure top_logprobs on the prompt and inspect the TextContent chunk:

from wayflowcore.messagelist import Message, TextContent
from wayflowcore.models import Prompt

prompt = Prompt(
    messages=[Message(content="Say 'Bern' and nothing else.")],
    generation_config=LlmGenerationConfig(top_logprobs=2, max_tokens=16),
)
completion = llm.generate(prompt)
text_chunk = next(chunk for chunk in completion.message.contents if isinstance(chunk, TextContent))

print(text_chunk.content)
print(text_chunk.logprobs)

For flows, you can request logprobs directly on PromptExecutionStep. When enabled, the step appends a logprobs output alongside the normal text output:

from wayflowcore.executors.executionstatus import FinishedStatus

logprob_start_step = StartStep(
    name="logprob_start_step",
    input_descriptors=[StringProperty("user_question")],
)
logprob_step = PromptExecutionStep(
    name="PromptExecutionWithLogprobs",
    prompt_template="{{user_question}}",
    llm=llm,
    top_logprobs=2,
)
logprob_flow = Flow(
    begin_step=logprob_start_step,
    control_flow_edges=[
        ControlFlowEdge(source_step=logprob_start_step, destination_step=logprob_step),
        ControlFlowEdge(source_step=logprob_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(logprob_start_step, "user_question", logprob_step, "user_question")
    ],
)
conversation = logprob_flow.start_conversation(
    inputs={"user_question": "What is the capital of Switzerland?"}
)
status = conversation.execute()
if isinstance(status, FinishedStatus):
    print(status.output_values[PromptExecutionStep.OUTPUT])
    print(status.output_values[PromptExecutionStep.LOGPROBS])

Apply the generation configuration from a dictionary#

If you have a generation configuration in a dictionary (for example, from a JSON or YAML file), you can instantiate the LlmGenerationConfig class as follows:

config_dict = {
    "max_tokens": 512,
    "temperature": 0.9,
}

config = LlmGenerationConfig.from_dict(config_dict)

Save a custom generation configuration#

If you would like to share your specific generation configuration, you can create a LlmGenerationConfig class instance and store it to a dictionary.


config = LlmGenerationConfig(max_tokens=1024, temperature=0.8, top_p=0.6)
config_dict = config.to_dict()

Agent Spec Exporting/Loading#

You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter. The following example exports the serialization of the flow defined above.

from wayflowcore.agentspec import AgentSpecExporter

serialized_assistant = AgentSpecExporter().to_yaml(flow)

Here is what the Agent Spec representation will look like ↓

Click here to see the assistant configuration.
{
  "component_type": "Flow",
  "id": "fc3d10f4-5ee2-40d8-a580-0db6c44b0b39",
  "name": "flow_0e4b989a",
  "description": "",
  "metadata": {
    "__metadata_info__": {}
  },
  "inputs": [
    {
      "type": "string",
      "title": "user_question"
    }
  ],
  "outputs": [
    {
      "description": "the generated text",
      "type": "string",
      "title": "output"
    }
  ],
  "start_node": {
    "$component_ref": "d8870848-f3c1-4a88-a0f3-b6ca20c61bab"
  },
  "nodes": [
    {
      "$component_ref": "d8870848-f3c1-4a88-a0f3-b6ca20c61bab"
    },
    {
      "$component_ref": "25917cac-52d4-4816-8c62-c18d8b70ee33"
    },
    {
      "$component_ref": "158c838b-f8be-41ef-8b66-64348c8d379c"
    }
  ],
  "control_flow_connections": [
    {
      "component_type": "ControlFlowEdge",
      "id": "7b8c0c0b-fcd1-4bf3-96be-dcf726ab1526",
      "name": "start_step_to_PromptExecution_control_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "from_node": {
        "$component_ref": "d8870848-f3c1-4a88-a0f3-b6ca20c61bab"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "25917cac-52d4-4816-8c62-c18d8b70ee33"
      }
    },
    {
      "component_type": "ControlFlowEdge",
      "id": "6b2c2840-126c-43fe-a8e1-f3cb08e8ae88",
      "name": "PromptExecution_to_None End node_control_flow_edge",
      "description": null,
      "metadata": {},
      "from_node": {
        "$component_ref": "25917cac-52d4-4816-8c62-c18d8b70ee33"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "158c838b-f8be-41ef-8b66-64348c8d379c"
      }
    }
  ],
  "data_flow_connections": [
    {
      "component_type": "DataFlowEdge",
      "id": "86ba0435-be9b-46b0-97ae-64e145045e19",
      "name": "start_step_user_question_to_PromptExecution_user_question_data_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "source_node": {
        "$component_ref": "d8870848-f3c1-4a88-a0f3-b6ca20c61bab"
      },
      "source_output": "user_question",
      "destination_node": {
        "$component_ref": "25917cac-52d4-4816-8c62-c18d8b70ee33"
      },
      "destination_input": "user_question"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "8638e234-9a23-45c5-89d4-296fc5a8c5ac",
      "name": "PromptExecution_output_to_None End node_output_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "25917cac-52d4-4816-8c62-c18d8b70ee33"
      },
      "source_output": "output",
      "destination_node": {
        "$component_ref": "158c838b-f8be-41ef-8b66-64348c8d379c"
      },
      "destination_input": "output"
    }
  ],
  "$referenced_components": {
    "25917cac-52d4-4816-8c62-c18d8b70ee33": {
      "component_type": "LlmNode",
      "id": "25917cac-52d4-4816-8c62-c18d8b70ee33",
      "name": "PromptExecution",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "description": "\"user_question\" input variable for the template",
          "type": "string",
          "title": "user_question"
        }
      ],
      "outputs": [
        {
          "description": "the generated text",
          "type": "string",
          "title": "output"
        }
      ],
      "branches": [
        "next"
      ],
      "llm_config": {
        "component_type": "VllmConfig",
        "id": "93d098ef-9643-4d38-a012-8903bacbb784",
        "name": "LLAMA_MODEL_ID",
        "description": null,
        "metadata": {
          "__metadata_info__": {}
        },
        "default_generation_parameters": null,
        "url": "LLAMA_API_URL",
        "model_id": "LLAMA_MODEL_ID"
      },
      "prompt_template": "{{user_question}}"
    },
    "d8870848-f3c1-4a88-a0f3-b6ca20c61bab": {
      "component_type": "StartNode",
      "id": "d8870848-f3c1-4a88-a0f3-b6ca20c61bab",
      "name": "start_step",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "type": "string",
          "title": "user_question"
        }
      ],
      "outputs": [
        {
          "type": "string",
          "title": "user_question"
        }
      ],
      "branches": [
        "next"
      ]
    },
    "158c838b-f8be-41ef-8b66-64348c8d379c": {
      "component_type": "EndNode",
      "id": "158c838b-f8be-41ef-8b66-64348c8d379c",
      "name": "None End node",
      "description": "End node representing all transitions to None in the WayFlow flow",
      "metadata": {},
      "inputs": [
        {
          "description": "the generated text",
          "type": "string",
          "title": "output"
        }
      ],
      "outputs": [
        {
          "description": "the generated text",
          "type": "string",
          "title": "output"
        }
      ],
      "branches": [],
      "branch_name": "next"
    }
  },
  "agentspec_version": "25.4.1"
}

You can then load the configuration back to an assistant using the AgentSpecLoader.

from wayflowcore.agentspec import AgentSpecLoader

assistant = AgentSpecLoader().load_yaml(serialized_assistant)

Next steps#

Having learned how to specify the generation configuration, you may now proceed to:

Some additional resources we recommend:

Full code#

Click on the card at the top of this page to download the full code for this guide or copy the code below.

  1# Copyright © 2025 Oracle and/or its affiliates.
  2#
  3# This software is under the Apache License 2.0
  4# %%[markdown]
  5# Code Example - How to Specify the Generation Configuration when Using LLMs
  6# --------------------------------------------------------------------------
  7
  8# How to use:
  9# Create a new Python virtual environment and install the latest WayFlow version.
 10# ```bash
 11# python -m venv venv-wayflowcore
 12# source venv-wayflowcore/bin/activate
 13# pip install --upgrade pip
 14# pip install "wayflowcore==26.1.2" 
 15# ```
 16
 17# You can now run the script
 18# 1. As a Python file:
 19# ```bash
 20# python example_generationconfig.py
 21# ```
 22# 2. As a Notebook (in VSCode):
 23# When viewing the file,
 24#  - press the keys Ctrl + Enter to run the selected cell
 25#  - or Shift + Enter to run the selected cell and move to the cell below# (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
 26# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.
 27
 28
 29
 30
 31# %%[markdown]
 32## Imports
 33
 34# %%
 35from wayflowcore.agent import Agent
 36from wayflowcore.models.llmgenerationconfig import LlmGenerationConfig
 37
 38
 39# %%[markdown]
 40## Define the llm generation configuration
 41
 42# %%
 43generation_config = LlmGenerationConfig(
 44    max_tokens=512,
 45    temperature=1.0,
 46    top_p=1.0,
 47    stop=["exit", "end"],
 48    frequency_penalty=0,
 49    extra_args={"seed": 1},
 50)
 51
 52
 53# %%[markdown]
 54## Define the vLLM
 55
 56# %%
 57from wayflowcore.models import VllmModel
 58
 59llm = VllmModel(
 60    model_id="LLAMA_MODEL_ID",
 61    host_port="LLAMA_API_URL",
 62    generation_config=generation_config,
 63)
 64# NOTE: host_port should be a string with the IP address/domain name and the port. An example string: "192.168.1.1:8000"
 65# NOTE: model_id usually indicates the HuggingFace model id,
 66# e.g. meta-llama/Llama-3.1-8B-Instruct from https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
 67
 68# %%[markdown]
 69## Build the agent and run it
 70
 71# %%
 72agent = Agent(llm=llm)
 73conversation = agent.start_conversation()
 74conversation.append_user_message("What is the capital of Switzerland?")
 75conversation.execute()
 76print(conversation.get_last_message())
 77
 78
 79# %%[markdown]
 80## Request logprobs from a direct llm call
 81
 82# %%
 83from wayflowcore.messagelist import Message, TextContent
 84from wayflowcore.models import Prompt
 85
 86prompt = Prompt(
 87    messages=[Message(content="Say 'Bern' and nothing else.")],
 88    generation_config=LlmGenerationConfig(top_logprobs=2, max_tokens=16),
 89)
 90completion = llm.generate(prompt)
 91text_chunk = next(chunk for chunk in completion.message.contents if isinstance(chunk, TextContent))
 92
 93print(text_chunk.content)
 94print(text_chunk.logprobs)
 95
 96
 97from wayflowcore.controlconnection import ControlFlowEdge
 98from wayflowcore.dataconnection import DataFlowEdge
 99
100
101# %%[markdown]
102## Import what is needed to build a flow
103
104# %%
105from wayflowcore.flow import Flow
106from wayflowcore.models.llmgenerationconfig import LlmGenerationConfig
107from wayflowcore.property import StringProperty
108from wayflowcore.steps import PromptExecutionStep, StartStep
109
110
111# %%[markdown]
112## Build the flow using custom generation parameters
113
114# %%
115start_step = StartStep(name="start_step", input_descriptors=[StringProperty("user_question")])
116prompt_step = PromptExecutionStep(
117    name="PromptExecution",
118    prompt_template="{{user_question}}",
119    llm=llm,
120    generation_config=LlmGenerationConfig(temperature=0.8),
121)
122flow = Flow(
123    begin_step=start_step,
124    control_flow_edges=[
125        ControlFlowEdge(source_step=start_step, destination_step=prompt_step),
126        ControlFlowEdge(source_step=prompt_step, destination_step=None),
127    ],
128    data_flow_edges=[DataFlowEdge(start_step, "user_question", prompt_step, "user_question")],
129)
130conversation = flow.start_conversation(
131    inputs={"user_question": "What is the capital of Switzerland?"}
132)
133conversation.execute()
134
135
136# %%[markdown]
137## Request logprobs from a flow step
138
139# %%
140from wayflowcore.executors.executionstatus import FinishedStatus
141
142logprob_start_step = StartStep(
143    name="logprob_start_step",
144    input_descriptors=[StringProperty("user_question")],
145)
146logprob_step = PromptExecutionStep(
147    name="PromptExecutionWithLogprobs",
148    prompt_template="{{user_question}}",
149    llm=llm,
150    top_logprobs=2,
151)
152logprob_flow = Flow(
153    begin_step=logprob_start_step,
154    control_flow_edges=[
155        ControlFlowEdge(source_step=logprob_start_step, destination_step=logprob_step),
156        ControlFlowEdge(source_step=logprob_step, destination_step=None),
157    ],
158    data_flow_edges=[
159        DataFlowEdge(logprob_start_step, "user_question", logprob_step, "user_question")
160    ],
161)
162conversation = logprob_flow.start_conversation(
163    inputs={"user_question": "What is the capital of Switzerland?"}
164)
165status = conversation.execute()
166if isinstance(status, FinishedStatus):
167    print(status.output_values[PromptExecutionStep.OUTPUT])
168    print(status.output_values[PromptExecutionStep.LOGPROBS])
169
170
171# %%[markdown]
172## Export config to Agent Spec
173
174# %%
175from wayflowcore.agentspec import AgentSpecExporter
176
177serialized_assistant = AgentSpecExporter().to_yaml(flow)
178
179
180# %%[markdown]
181## Load Agent Spec config
182
183# %%
184from wayflowcore.agentspec import AgentSpecLoader
185
186assistant = AgentSpecLoader().load_yaml(serialized_assistant)
187
188
189# %%[markdown]
190## Build the generation configuration from dictionary
191
192# %%
193config_dict = {
194    "max_tokens": 512,
195    "temperature": 0.9,
196}
197
198config = LlmGenerationConfig.from_dict(config_dict)
199
200
201# %%[markdown]
202## Export a generation configuration to dictionary
203
204# %%
205
206config = LlmGenerationConfig(max_tokens=1024, temperature=0.8, top_p=0.6)
207config_dict = config.to_dict()