LLMs#

This page presents all APIs and classes related to LLM models.

agentspec-icon

Visit the Agent Spec API Documentation to learn more about LLMs Components.

Agent Spec - LLMs API Reference

Tip

Click the button above ↑ to visit the Agent Spec Documentation

LlmModel#

class wayflowcore.models.llmmodel.LlmModel(model_id, generation_config, chat_template=None, agent_template=None, supports_structured_generation=None, supports_tool_calling=None, __metadata_info__=None, id=None, name=None, description=None)#

Base class for LLM models.

Parameters:
  • model_id (str) – ID of the model.

  • generation_config (LlmGenerationConfig | None) – Parameters for LLM generation.

  • chat_template (PromptTemplate | None) – Default template for chat completion.

  • agent_template (PromptTemplate | None) – Default template for agents using this model.

  • supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.

  • supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.

  • id (str | None) – ID of the component.

  • name (str | None) – Name of the component.

  • description (str | None) – Description of the component.

  • __metadata_info__ (Dict[str, Any] | None) –

abstract property config: Dict[str, Any]#

Get the configuration dictionary for the {VLlm/OpenAI/…} model

property default_agent_template: PromptTemplate#
property default_chat_template: PromptTemplate#
generate(prompt, _conversation=None)#

Generates a new message based on a prompt using a LLM

Parameters:
  • prompt (str | Prompt) – Prompt that contains the messages and other arguments to send to the LLM

  • _conversation (Conversation | None) –

Return type:

LlmCompletion

Examples

>>> from wayflowcore.messagelist import Message
>>> from wayflowcore.models import Prompt
>>> prompt = Prompt(messages=[Message('What is the capital of Switzerland?')])
>>> completion = llm.generate(prompt)
>>> # LlmCompletion(message=Message(content='The capital of Switzerland is Bern'))
async generate_async(prompt, _conversation=None)#
Parameters:
  • prompt (str | Prompt) –

  • _conversation (Conversation | None) –

Return type:

LlmCompletion

get_total_token_consumption(conversation_id)#

Calculate and return the total token consumption for a given conversation.

This method computes the aggregate token usage for the specified conversation by summing the token usages.

Parameters:

conversation_id (str) – The unique identifier for the conversation whose token consumption is to be calculated.

Return type:

A TokenUsage object that gathers all token usage information.

See also

TokenUsage

An object to gather all token usage information.

stream_generate(prompt, _conversation=None)#
Parameters:
  • prompt (str | Prompt) –

  • _conversation (Conversation | None) –

Return type:

Iterable[Tuple[StreamChunkType, Message | None]]

async stream_generate_async(prompt, _conversation=None)#

Returns an async iterator of message chunks

Parameters:
  • prompt (str | Prompt) – Prompt that contains the messages and other arguments to send to the LLM

  • _conversation (Conversation | None) –

Return type:

AsyncIterable[Tuple[StreamChunkType, Message | None]]

Examples

>>> import asyncio
>>> from wayflowcore.messagelist import Message, MessageType
>>> from wayflowcore.models import Prompt
>>> message = Message(content="What is the capital of Switzerland?", message_type=MessageType.USER)
>>> llm_stream = llm.stream_generate(
...     prompt=Prompt(messages=[message])
... )
>>> for chunk_type, chunk in llm_stream:
...     print(chunk)   
>>> # Bern
>>> #  is the
>>> # capital
>>> #  of
>>> #  Switzerland
>>> # Message(content='Bern is the capital of Switzerland', message_type=MessageType.AGENT)

LlmModelFactory#

class wayflowcore.models.llmmodelfactory.LlmModelFactory#

Factory class that creates LlmModel instances from configuration dictionaries.

Supports vLLM, Ollama, OpenAI and OCIGenAI models.

static from_config(model_config)#
Parameters:

model_config (Dict[str, Any]) –

Return type:

LlmModel

Token Usage#

Class that is used to gather all token usage information.

class wayflowcore.tokenusage.TokenUsage(input_tokens=0, output_tokens=0, cached_tokens=0, total_tokens=0, exact_count=False)#

Gathers all token usage information.

Parameters:
  • input_tokens (int) – Number of tokens used as input/context.

  • cached_tokens (int) – Number of tokens in prompt that were cached.

  • output_tokens (int) – Number of tokens generated by the model.

  • exact_count (bool) – Whether these numbers are exact or were estimated using the 1 token ≈ 3/4 word rule

  • total_tokens (int) –

cached_tokens: int = 0#
exact_count: bool = False#
input_tokens: int = 0#
output_tokens: int = 0#
total_tokens: int = 0#

LLM Generation Config#

Parameters for LLM generation (max_tokens, temperature, top_p).

class wayflowcore.models.llmgenerationconfig.LlmGenerationConfig(max_tokens=None, temperature=None, top_p=None, stop=None, frequency_penalty=None, extra_args=<factory>, *, id=<factory>, __metadata_info__=<factory>)#

Parameters for LLM generation

Parameters:
  • max_tokens (int | None) – Maximum number of tokens to generate as output.

  • temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

  • top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

  • stop (List[str] | None) – List of stop words to indicate the LLM to stop generating when encountering one of these words. This helps reducing hallucinations, when using templates like ReAct. Some reasoning models (o3, o4-mini…) might not support it.

  • frequency_penalty (float | None) – float between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

  • extra_args (Dict[str, Any]) –

    dictionary of extra arguments that can be used by specific model providers

    Note

    The extra parameters should never include sensitive information.

  • id (str) –

  • __metadata_info__ (Dict[str, Any]) –

extra_args: Dict[str, Any]#
frequency_penalty: float | None = None#
static from_dict(config)#
Parameters:

config (Dict[str, Any]) –

Return type:

LlmGenerationConfig

max_tokens: int | None = None#
merge_config(overriding_config)#
Parameters:

overriding_config (LlmGenerationConfig | None) –

Return type:

LlmGenerationConfig

stop: List[str] | None = None#
temperature: float | None = None#
to_dict()#
Return type:

Dict[str, Any]

top_p: float | None = None#

All models#

OpenAI Compatible Models#

class wayflowcore.models.openaicompatiblemodel.OpenAICompatibleModel(model_id, base_url, proxy=None, api_key=None, generation_config=None, supports_structured_generation=True, supports_tool_calling=True, __metadata_info__=None, id=None, name=None, description=None)#

Model to use remote LLM endpoints that use OpenAI-compatible chat APIs.

Parameters:
  • model_id (str) – Name of the model to use

  • base_url (str) – Hostname and port of the vllm server where the model is hosted. If you specify a url ending with /completions it will be used as-is, otherwise the url path v1/chat/completions will be appended to the base url.

  • proxy (str | None) – Proxy to use to connect to the remote LLM endpoint

  • api_key (str | None) – API key to use for the request if needed. It will be formatted in the OpenAI format (as “Bearer API_KEY” in the request header)

  • generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model

  • supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.

  • supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.

  • id (str | None) – ID of the component.

  • name (str | None) – Name of the component.

  • description (str | None) – Description of the component.

  • __metadata_info__ (Dict[str, Any] | None) –

Examples

>>> from wayflowcore.models import OpenAICompatibleModel
>>> llm = OpenAICompatibleModel(
...     model_id="<MODEL_NAME>",
...     base_url="<ENDPOINT_URL>",
...     api_key="<API_KEY_FOR_REMOTE_ENDPOINT>",
... )
property config: Dict[str, Any]#

Get the configuration dictionary for the {VLlm/OpenAI/…} model

OpenAI Models#

class wayflowcore.models.openaimodel.OpenAIModel(model_id='gpt-4o-mini', api_key=None, generation_config=None, proxy=None, __metadata_info__=None, id=None, name=None, description=None)#

Model powered by OpenAI.

Parameters:
  • model_id (str) – Name of the model to use

  • api_key (str | None) – API key for the OpenAI endpoint. Overrides existing OPENAI_API_KEY environment variable.

  • generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model

  • proxy (str | None) – proxy to access the remote model under VPN

  • id (str | None) – ID of the component.

  • name (str | None) – Name of the component.

  • description (str | None) – Description of the component.

  • important:: (..) –

    When running under Oracle VPN, the connection to the OCIGenAI service requires to run the model without any proxy. Therefore, make sure not to have any of http_proxy or HTTP_PROXY environment variables setup, or unset them with unset http_proxy HTTP_PROXY. Please also ensure that the OPENAI_API_KEY is set beforehand to access this model. A list of available OpenAI models can be found at the following link: OpenAI Models

  • __metadata_info__ (Dict[str, Any] | None) –

Examples

>>> from wayflowcore.models import LlmModelFactory
>>> OPENAI_CONFIG = {
...     "model_type": "openai",
...     "model_id": "gpt-4o-mini",
... }
>>> llm = LlmModelFactory.from_config(OPENAI_CONFIG)  

Notes

When running with Oracle VPN, you need to specify a https proxy, either globally or at the model level:

>>> OPENAI_CONFIG = {
...    "model_type": "openai",
...    "model_id": "gpt-4o-mini",
...    "proxy": "<PROXY_ADDRESS>",
... }  
property config: Dict[str, Any]#

Get the configuration dictionary for the {VLlm/OpenAI/…} model

Ollama Models#

class wayflowcore.models.ollamamodel.OllamaModel(model_id, host_port='localhost:11434', proxy=None, generation_config=None, supports_structured_generation=True, supports_tool_calling=True, __metadata_info__=None, id=None, name=None, description=None)#

Model powered by a locally hosted Ollama server.

Parameters:
  • model_id (str) – Name of the model to use. List of model names can be found here: https://ollama.com/search

  • host_port (str) – Hostname and port of the vllm server where the model is hosted. By default Ollama binds port 11434.

  • proxy (str | None) – Proxy to use to connect to the remote LLM endpoint

  • generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model

  • supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.

  • supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.

  • id (str | None) – ID of the component.

  • name (str | None) – Name of the component.

  • description (str | None) – Description of the component.

  • __metadata_info__ (Dict[str, Any] | None) –

Examples

>>> from wayflowcore.models import LlmModelFactory
>>> OLLAMA_CONFIG = {
...     "model_type": "ollama",
...     "model_id": "<MODEL_NAME>",
... }
>>> llm = LlmModelFactory.from_config(OLLAMA_CONFIG)

Notes

As of November 2024, Ollama does not support tool calling with token streaming. To enable this functionality, we prepend and append some specific REACT prompts and format tools with the REACT prompting template when:

  • the model should use tools

  • the list of message contains some tool_requests or tool_results

Be aware of that when you generate with tools or tool calls. To disable this behaviour, set use_tools to False and make sure the prompt doesn’t contain tool_call and tool_result messages. See https://arxiv.org/abs/2210.03629 for learning more about the REACT prompting techniques.

property config: Dict[str, Any]#

Get the configuration dictionary for the {VLlm/OpenAI/…} model

property default_agent_template: PromptTemplate#
property default_chat_template: PromptTemplate#

VLLM Models#

class wayflowcore.models.vllmmodel.VllmModel(model_id, host_port, proxy=None, generation_config=None, supports_structured_generation=True, supports_tool_calling=True, __metadata_info__=None, id=None, name=None, description=None)#

Model powered by a model hosted with VLLM server.

Parameters:
  • model_id (str) – Name of the model to use

  • host_port (str) – Hostname and port of the vllm server where the model is hosted

  • proxy (str | None) – Proxy to use to connect to the remote LLM endpoint

  • generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model

  • supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.

  • supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.

  • id (str | None) – ID of the component.

  • name (str | None) – Name of the component.

  • description (str | None) – Description of the component.

  • __metadata_info__ (Dict[str, Any] | None) –

Examples

>>> from wayflowcore.models import LlmModelFactory
>>> VLLM_CONFIG = {
...     "model_type": "vllm",
...     "host_port": "<HOSTNAME>",
...     "model_id": "<MODEL_NAME>",
... }
>>> llm = LlmModelFactory.from_config(VLLM_CONFIG)

Notes

Usually, VLLM models do not support tool calling. To enable this, we prepend and append some specific REACT prompts and format tools with the REACT prompting template when:

  • the model should use tools

  • the list of message contains some tool_requests or tool_results

Be aware of that when you generate with tools or tool calls. To disable this behaviour, set use_tools to False and make sure the prompt doesn’t contain tool_call and tool_result messages. See https://arxiv.org/abs/2210.03629 for learning more about the REACT prompting techniques.

Notes

When running under Oracle VPN, the connection to the OCIGenAI service requires to run the model without any proxy. Therefore, make sure not to have any of http_proxy or HTTP_PROXY environment variables setup, or unset them with unset http_proxy HTTP_PROXY

property config: Dict[str, Any]#

Get the configuration dictionary for the {VLlm/OpenAI/…} model

property default_agent_template: PromptTemplate#
property default_chat_template: PromptTemplate#

OCI GenAI Models#

class wayflowcore.models.ocigenaimodel.OCIGenAIModel(*, model_id, compartment_id=None, client_config=None, serving_mode=None, provider=None, generation_config=None, id=None, name=None, description=None, __metadata_info__=None, service_endpoint=None, auth_type=None, auth_profile='DEFAULT')#

Model powered by OCIGenAI.

Parameters:
  • model_id (str) – Name of the model to use.

  • compartment_id (str | None) – The compartment OCID. Can be also configured in the OCI_GENAI_COMPARTMENT env variable.

  • client_config (OCIClientConfig | None) – OCI client config to authenticate the OCI service.

  • serving_mode (ServingMode | None) – OCI serving mode for the model. Either ServingMode.ON_DEMAND or ServingMode.DEDICATED. When set to None, it will be auto-detected based on the model_id.

  • provider (ModelProvider | None) – Name of the provider of the underlying model, to adapt the request. Needs to be specified in ServingMode.DEDICATED. Is auto-detected when in ServingMode.ON_DEMAND based on the model_id.

  • generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model

  • id (str | None) – ID of the component.

  • name (str | None) – Name of the component.

  • description (str | None) – Description of the component.

  • __metadata_info__ (Dict[str, Any] | None) –

  • service_endpoint (str | None) –

  • auth_type (str | None) –

  • auth_profile (str | None) –

Examples

>>> from wayflowcore.models.ocigenaimodel import OCIGenAIModel
>>> from wayflowcore.models.ociclientconfig import (
...     OCIClientConfigWithInstancePrincipal,
...     OCIClientConfigWithApiKey,
... )
>>> ## Example 1. Instance Principal
>>> client_config = OCIClientConfigWithInstancePrincipal(
...     service_endpoint="my_service_endpoint",
... )
>>> ## Example 2. API Key from a config file (~/.oci/config)
>>> client_config = OCIClientConfigWithApiKey(
...     service_endpoint="my_service_endpoint",
...     auth_profile="DEFAULT",
...     _auth_file_location="~/.oci/config"
... )
>>> llm = OCIGenAIModel(
...     model_id="xai.grok-4",
...     client_config=client_config,
...     compartment_id="my_compartment_id",
... )  

Notes

When running under Oracle VPN, the connection to the OCIGenAI service requires to run the model without any proxy. Therefore, make sure not to have any of http_proxy or HTTP_PROXY environment variables setup, or unset them with unset http_proxy HTTP_PROXY

Warning

If when using INSTANCE_PRINCIPAL authentication, the response of the model returns a 404 error, please check if the machine is listed in the dynamic group and has the right privileges. Otherwise, please ask someone with administrative privileges. To grant an OCI Compute instance the ability to authenticate as an Instance Principal, one needs to define a Dynamic Group that includes the instance and create a policy that allows this dynamic group to manage OCI GenAI services.

property config: Dict[str, Any]#

Get the configuration dictionary for the {VLlm/OpenAI/…} model

property default_agent_template: PromptTemplate#
property default_chat_template: PromptTemplate#

OCI Client Config Classes for Authentication#

class wayflowcore.models.ociclientconfig.OCIClientConfigWithApiKey(service_endpoint, compartment_id=None, auth_profile=None, _auth_file_location=None)#

OCI client config class for authentication using API_KEY.

Parameters:
  • service_endpoint (str) – the endpoint of the OCI GenAI service.

  • compartment_id (str | None) – compartment id to use.

  • auth_profile (str | None) – name of the profile to use in the config file. Defaults to “DEFAULT”.

  • _auth_file_location (str | None) –

to_dict()#
Return type:

Dict[str, str | Dict[str, str]]

class wayflowcore.models.ociclientconfig.OCIClientConfigWithSecurityToken(service_endpoint, compartment_id=None, auth_profile=None, _auth_file_location=None)#

OCI client config class for authentication using SECURITY_TOKEN.

Parameters:
  • service_endpoint (str) – the endpoint of the OCI GenAI service.

  • compartment_id (str | None) – compartment id to use.

  • auth_profile (str | None) – name of the profile to use in the config file. Defaults to “DEFAULT”.

  • _auth_file_location (str | None) –

to_dict()#
Return type:

Dict[str, str | Dict[str, str]]

class wayflowcore.models.ociclientconfig.OCIClientConfigWithInstancePrincipal(service_endpoint, compartment_id=None)#

OCI client config class for authentication using INSTANCE_PRINCIPAL.

Parameters:
  • service_endpoint (str) – the endpoint of the OCI GenAI service.

  • compartment_id (str | None) – compartment id to use.

class wayflowcore.models.ociclientconfig.OCIClientConfigWithResourcePrincipal(service_endpoint, compartment_id=None)#

OCI client config class for authentication using RESOURCE_PRINCIPAL.

Parameters:
  • service_endpoint (str) – the endpoint of the OCI GenAI service.

  • compartment_id (str | None) – compartment id to use.

class wayflowcore.models.ociclientconfig.OCIClientConfigWithUserAuthentication(service_endpoint, user_config, compartment_id=None)#
Parameters:
to_dict()#
Return type:

Dict[str, str | Dict[str, Any]]

Important

OCIClientConfigWithUserAuthentication supports the same authentication type as OCIClientConfigWithApiKey but without a config file. Values in the config file are passed directly through OCIUserAuthenticationConfig below.

class wayflowcore.models.ociclientconfig.OCIUserAuthenticationConfig(user, key_content, fingerprint, tenancy, region)#

Create an OCI user authentication config, which can be passed to the OCIClientConfigWithUserAuthentication class in order to authenticate the OCI service.

This class provides a way to authenticate the OCI service without relying on a config file. In other words, it is equivalent to saving the config in a file and passing the file using OCIClientConfigWithApiKey class.

Parameters:
  • user (str) – user OCID

  • key_content (str) – content of the private key

  • fingerprint (str) – fingerprint of your public key

  • tenancy (str) – tenancy OCID

  • region (str) – OCI region

  • warning:: (..) – This class contains sensitive information. Please make sure that the contents are not printed or logged.

classmethod from_dict(client_config)#
Parameters:

client_config (Dict[str, str]) –

Return type:

OCIUserAuthenticationConfig

to_dict()#
Return type:

Dict[str, str]

Important

The serialization of this class is currently not supported since the values are sensitive information.