LLMs#
This page presents all APIs and classes related to LLM models.
Tip
Click the button above ↑ to visit the Agent Spec Documentation
LlmModel#
- class wayflowcore.models.llmmodel.LlmModel(model_id, generation_config, chat_template=None, agent_template=None, supports_structured_generation=None, supports_tool_calling=None, __metadata_info__=None, id=None, name=None, description=None)#
Base class for LLM models.
- Parameters:
model_id (str) – ID of the model.
generation_config (LlmGenerationConfig | None) – Parameters for LLM generation.
chat_template (PromptTemplate | None) – Default template for chat completion.
agent_template (PromptTemplate | None) – Default template for agents using this model.
supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.
supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.
id (str | None) – ID of the component.
name (str | None) – Name of the component.
description (str | None) – Description of the component.
__metadata_info__ (Dict[str, Any] | None) –
- abstract property config: Dict[str, Any]#
Get the configuration dictionary for the {VLlm/OpenAI/…} model
- property default_agent_template: PromptTemplate#
- property default_chat_template: PromptTemplate#
- generate(prompt, _conversation=None)#
Generates a new message based on a prompt using a LLM
- Parameters:
prompt (str | Prompt) – Prompt that contains the messages and other arguments to send to the LLM
_conversation (Conversation | None) –
- Return type:
LlmCompletion
Examples
>>> from wayflowcore.messagelist import Message >>> from wayflowcore.models import Prompt >>> prompt = Prompt(messages=[Message('What is the capital of Switzerland?')]) >>> completion = llm.generate(prompt) >>> # LlmCompletion(message=Message(content='The capital of Switzerland is Bern'))
- async generate_async(prompt, _conversation=None)#
- Parameters:
prompt (str | Prompt) –
_conversation (Conversation | None) –
- Return type:
LlmCompletion
- get_total_token_consumption(conversation_id)#
Calculate and return the total token consumption for a given conversation.
This method computes the aggregate token usage for the specified conversation by summing the token usages.
- Parameters:
conversation_id (str) – The unique identifier for the conversation whose token consumption is to be calculated.
- Return type:
A TokenUsage object that gathers all token usage information.
See also
TokenUsage
An object to gather all token usage information.
- stream_generate(prompt, _conversation=None)#
- Parameters:
prompt (str | Prompt) –
_conversation (Conversation | None) –
- Return type:
Iterable[Tuple[StreamChunkType, Message | None]]
- async stream_generate_async(prompt, _conversation=None)#
Returns an async iterator of message chunks
- Parameters:
prompt (str | Prompt) – Prompt that contains the messages and other arguments to send to the LLM
_conversation (Conversation | None) –
- Return type:
AsyncIterable[Tuple[StreamChunkType, Message | None]]
Examples
>>> import asyncio >>> from wayflowcore.messagelist import Message, MessageType >>> from wayflowcore.models import Prompt >>> message = Message(content="What is the capital of Switzerland?", message_type=MessageType.USER) >>> llm_stream = llm.stream_generate( ... prompt=Prompt(messages=[message]) ... ) >>> for chunk_type, chunk in llm_stream: ... print(chunk) >>> # Bern >>> # is the >>> # capital >>> # of >>> # Switzerland >>> # Message(content='Bern is the capital of Switzerland', message_type=MessageType.AGENT)
LlmModelFactory#
Token Usage#
Class that is used to gather all token usage information.
- class wayflowcore.tokenusage.TokenUsage(input_tokens=0, output_tokens=0, cached_tokens=0, total_tokens=0, exact_count=False)#
Gathers all token usage information.
- Parameters:
input_tokens (int) – Number of tokens used as input/context.
cached_tokens (int) – Number of tokens in prompt that were cached.
output_tokens (int) – Number of tokens generated by the model.
exact_count (bool) – Whether these numbers are exact or were estimated using the 1 token ≈ 3/4 word rule
total_tokens (int) –
- cached_tokens: int = 0#
- exact_count: bool = False#
- input_tokens: int = 0#
- output_tokens: int = 0#
- total_tokens: int = 0#
LLM Generation Config#
Parameters for LLM generation (max_tokens
, temperature
, top_p
).
- class wayflowcore.models.llmgenerationconfig.LlmGenerationConfig(max_tokens=None, temperature=None, top_p=None, stop=None, frequency_penalty=None, extra_args=<factory>, *, id=<factory>, __metadata_info__=<factory>)#
Parameters for LLM generation
- Parameters:
max_tokens (int | None) – Maximum number of tokens to generate as output.
temperature (float | None) – What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or
top_p
but not both.top_p (float | None) – An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
stop (List[str] | None) – List of stop words to indicate the LLM to stop generating when encountering one of these words. This helps reducing hallucinations, when using templates like ReAct. Some reasoning models (o3, o4-mini…) might not support it.
frequency_penalty (float | None) – float between -2.0 and 2.0 that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
extra_args (Dict[str, Any]) –
dictionary of extra arguments that can be used by specific model providers
Note
The extra parameters should never include sensitive information.
id (str) –
__metadata_info__ (Dict[str, Any]) –
- extra_args: Dict[str, Any]#
- frequency_penalty: float | None = None#
- static from_dict(config)#
- Parameters:
config (Dict[str, Any]) –
- Return type:
- max_tokens: int | None = None#
- merge_config(overriding_config)#
- Parameters:
overriding_config (LlmGenerationConfig | None) –
- Return type:
- stop: List[str] | None = None#
- temperature: float | None = None#
- to_dict()#
- Return type:
Dict[str, Any]
- top_p: float | None = None#
All models#
OpenAI Compatible Models#
- class wayflowcore.models.openaicompatiblemodel.OpenAICompatibleModel(model_id, base_url, proxy=None, api_key=None, generation_config=None, supports_structured_generation=True, supports_tool_calling=True, __metadata_info__=None, id=None, name=None, description=None)#
Model to use remote LLM endpoints that use OpenAI-compatible chat APIs.
- Parameters:
model_id (str) – Name of the model to use
base_url (str) – Hostname and port of the vllm server where the model is hosted. If you specify a url ending with /completions it will be used as-is, otherwise the url path v1/chat/completions will be appended to the base url.
proxy (str | None) – Proxy to use to connect to the remote LLM endpoint
api_key (str | None) – API key to use for the request if needed. It will be formatted in the OpenAI format (as “Bearer API_KEY” in the request header)
generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model
supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.
supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.
id (str | None) – ID of the component.
name (str | None) – Name of the component.
description (str | None) – Description of the component.
__metadata_info__ (Dict[str, Any] | None) –
Examples
>>> from wayflowcore.models import OpenAICompatibleModel >>> llm = OpenAICompatibleModel( ... model_id="<MODEL_NAME>", ... base_url="<ENDPOINT_URL>", ... api_key="<API_KEY_FOR_REMOTE_ENDPOINT>", ... )
- property config: Dict[str, Any]#
Get the configuration dictionary for the {VLlm/OpenAI/…} model
OpenAI Models#
- class wayflowcore.models.openaimodel.OpenAIModel(model_id='gpt-4o-mini', api_key=None, generation_config=None, proxy=None, __metadata_info__=None, id=None, name=None, description=None)#
Model powered by OpenAI.
- Parameters:
model_id (str) – Name of the model to use
api_key (str | None) – API key for the OpenAI endpoint. Overrides existing
OPENAI_API_KEY
environment variable.generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model
proxy (str | None) – proxy to access the remote model under VPN
id (str | None) – ID of the component.
name (str | None) – Name of the component.
description (str | None) – Description of the component.
important:: (..) –
When running under Oracle VPN, the connection to the OCIGenAI service requires to run the model without any proxy. Therefore, make sure not to have any of
http_proxy
orHTTP_PROXY
environment variables setup, or unset them withunset http_proxy HTTP_PROXY
. Please also ensure that theOPENAI_API_KEY
is set beforehand to access this model. A list of available OpenAI models can be found at the following link: OpenAI Models__metadata_info__ (Dict[str, Any] | None) –
Examples
>>> from wayflowcore.models import LlmModelFactory >>> OPENAI_CONFIG = { ... "model_type": "openai", ... "model_id": "gpt-4o-mini", ... } >>> llm = LlmModelFactory.from_config(OPENAI_CONFIG)
Notes
When running with Oracle VPN, you need to specify a https proxy, either globally or at the model level:
>>> OPENAI_CONFIG = { ... "model_type": "openai", ... "model_id": "gpt-4o-mini", ... "proxy": "<PROXY_ADDRESS>", ... }
- property config: Dict[str, Any]#
Get the configuration dictionary for the {VLlm/OpenAI/…} model
Ollama Models#
- class wayflowcore.models.ollamamodel.OllamaModel(model_id, host_port='localhost:11434', proxy=None, generation_config=None, supports_structured_generation=True, supports_tool_calling=True, __metadata_info__=None, id=None, name=None, description=None)#
Model powered by a locally hosted Ollama server.
- Parameters:
model_id (str) – Name of the model to use. List of model names can be found here: https://ollama.com/search
host_port (str) – Hostname and port of the vllm server where the model is hosted. By default Ollama binds port 11434.
proxy (str | None) – Proxy to use to connect to the remote LLM endpoint
generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model
supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.
supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.
id (str | None) – ID of the component.
name (str | None) – Name of the component.
description (str | None) – Description of the component.
__metadata_info__ (Dict[str, Any] | None) –
Examples
>>> from wayflowcore.models import LlmModelFactory >>> OLLAMA_CONFIG = { ... "model_type": "ollama", ... "model_id": "<MODEL_NAME>", ... } >>> llm = LlmModelFactory.from_config(OLLAMA_CONFIG)
Notes
As of November 2024, Ollama does not support tool calling with token streaming. To enable this functionality, we prepend and append some specific REACT prompts and format tools with the REACT prompting template when:
the model should use tools
the list of message contains some tool_requests or tool_results
Be aware of that when you generate with tools or tool calls. To disable this behaviour, set use_tools to False and make sure the prompt doesn’t contain tool_call and tool_result messages. See https://arxiv.org/abs/2210.03629 for learning more about the REACT prompting techniques.
- property config: Dict[str, Any]#
Get the configuration dictionary for the {VLlm/OpenAI/…} model
- property default_agent_template: PromptTemplate#
- property default_chat_template: PromptTemplate#
VLLM Models#
- class wayflowcore.models.vllmmodel.VllmModel(model_id, host_port, proxy=None, generation_config=None, supports_structured_generation=True, supports_tool_calling=True, __metadata_info__=None, id=None, name=None, description=None)#
Model powered by a model hosted with VLLM server.
- Parameters:
model_id (str) – Name of the model to use
host_port (str) – Hostname and port of the vllm server where the model is hosted
proxy (str | None) – Proxy to use to connect to the remote LLM endpoint
generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model
supports_structured_generation (bool | None) – Whether the model supports structured generation or not. When set to None, the model will be prompted with a response format and it will check it can use structured generation.
supports_tool_calling (bool | None) – Whether the model supports tool calling or not. When set to None, the model will be prompted with a tool and it will check it can use the tool.
id (str | None) – ID of the component.
name (str | None) – Name of the component.
description (str | None) – Description of the component.
__metadata_info__ (Dict[str, Any] | None) –
Examples
>>> from wayflowcore.models import LlmModelFactory >>> VLLM_CONFIG = { ... "model_type": "vllm", ... "host_port": "<HOSTNAME>", ... "model_id": "<MODEL_NAME>", ... } >>> llm = LlmModelFactory.from_config(VLLM_CONFIG)
Notes
Usually, VLLM models do not support tool calling. To enable this, we prepend and append some specific REACT prompts and format tools with the REACT prompting template when:
the model should use tools
the list of message contains some tool_requests or tool_results
Be aware of that when you generate with tools or tool calls. To disable this behaviour, set use_tools to False and make sure the prompt doesn’t contain tool_call and tool_result messages. See https://arxiv.org/abs/2210.03629 for learning more about the REACT prompting techniques.
Notes
When running under Oracle VPN, the connection to the OCIGenAI service requires to run the model without any proxy. Therefore, make sure not to have any of http_proxy or HTTP_PROXY environment variables setup, or unset them with unset http_proxy HTTP_PROXY
- property config: Dict[str, Any]#
Get the configuration dictionary for the {VLlm/OpenAI/…} model
- property default_agent_template: PromptTemplate#
- property default_chat_template: PromptTemplate#
OCI GenAI Models#
- class wayflowcore.models.ocigenaimodel.OCIGenAIModel(*, model_id, compartment_id=None, client_config=None, serving_mode=None, provider=None, generation_config=None, id=None, name=None, description=None, __metadata_info__=None, service_endpoint=None, auth_type=None, auth_profile='DEFAULT')#
Model powered by OCIGenAI.
- Parameters:
model_id (str) – Name of the model to use.
compartment_id (str | None) – The compartment OCID. Can be also configured in the OCI_GENAI_COMPARTMENT env variable.
client_config (OCIClientConfig | None) – OCI client config to authenticate the OCI service.
serving_mode (ServingMode | None) – OCI serving mode for the model. Either
ServingMode.ON_DEMAND
orServingMode.DEDICATED
. When set to None, it will be auto-detected based on themodel_id
.provider (ModelProvider | None) – Name of the provider of the underlying model, to adapt the request. Needs to be specified in
ServingMode.DEDICATED
. Is auto-detected when inServingMode.ON_DEMAND
based on themodel_id
.generation_config (LlmGenerationConfig | None) – default parameters for text generation with this model
id (str | None) – ID of the component.
name (str | None) – Name of the component.
description (str | None) – Description of the component.
__metadata_info__ (Dict[str, Any] | None) –
service_endpoint (str | None) –
auth_type (str | None) –
auth_profile (str | None) –
Examples
>>> from wayflowcore.models.ocigenaimodel import OCIGenAIModel >>> from wayflowcore.models.ociclientconfig import ( ... OCIClientConfigWithInstancePrincipal, ... OCIClientConfigWithApiKey, ... ) >>> ## Example 1. Instance Principal >>> client_config = OCIClientConfigWithInstancePrincipal( ... service_endpoint="my_service_endpoint", ... ) >>> ## Example 2. API Key from a config file (~/.oci/config) >>> client_config = OCIClientConfigWithApiKey( ... service_endpoint="my_service_endpoint", ... auth_profile="DEFAULT", ... _auth_file_location="~/.oci/config" ... ) >>> llm = OCIGenAIModel( ... model_id="xai.grok-4", ... client_config=client_config, ... compartment_id="my_compartment_id", ... )
Notes
When running under Oracle VPN, the connection to the OCIGenAI service requires to run the model without any proxy. Therefore, make sure not to have any of http_proxy or HTTP_PROXY environment variables setup, or unset them with unset http_proxy HTTP_PROXY
Warning
If when using
INSTANCE_PRINCIPAL
authentication, the response of the model returns a404
error, please check if the machine is listed in the dynamic group and has the right privileges. Otherwise, please ask someone with administrative privileges. To grant an OCI Compute instance the ability to authenticate as an Instance Principal, one needs to define a Dynamic Group that includes the instance and create a policy that allows this dynamic group to manage OCI GenAI services.- property config: Dict[str, Any]#
Get the configuration dictionary for the {VLlm/OpenAI/…} model
- property default_agent_template: PromptTemplate#
- property default_chat_template: PromptTemplate#
OCI Client Config Classes for Authentication#
- class wayflowcore.models.ociclientconfig.OCIClientConfigWithApiKey(service_endpoint, compartment_id=None, auth_profile=None, _auth_file_location=None)#
OCI client config class for authentication using API_KEY.
- Parameters:
service_endpoint (str) – the endpoint of the OCI GenAI service.
compartment_id (str | None) – compartment id to use.
auth_profile (str | None) – name of the profile to use in the config file. Defaults to “DEFAULT”.
_auth_file_location (str | None) –
- to_dict()#
- Return type:
Dict[str, str | Dict[str, str]]
- class wayflowcore.models.ociclientconfig.OCIClientConfigWithSecurityToken(service_endpoint, compartment_id=None, auth_profile=None, _auth_file_location=None)#
OCI client config class for authentication using SECURITY_TOKEN.
- Parameters:
service_endpoint (str) – the endpoint of the OCI GenAI service.
compartment_id (str | None) – compartment id to use.
auth_profile (str | None) – name of the profile to use in the config file. Defaults to “DEFAULT”.
_auth_file_location (str | None) –
- to_dict()#
- Return type:
Dict[str, str | Dict[str, str]]
- class wayflowcore.models.ociclientconfig.OCIClientConfigWithInstancePrincipal(service_endpoint, compartment_id=None)#
OCI client config class for authentication using INSTANCE_PRINCIPAL.
- Parameters:
service_endpoint (str) – the endpoint of the OCI GenAI service.
compartment_id (str | None) – compartment id to use.
- class wayflowcore.models.ociclientconfig.OCIClientConfigWithResourcePrincipal(service_endpoint, compartment_id=None)#
OCI client config class for authentication using RESOURCE_PRINCIPAL.
- Parameters:
service_endpoint (str) – the endpoint of the OCI GenAI service.
compartment_id (str | None) – compartment id to use.
- class wayflowcore.models.ociclientconfig.OCIClientConfigWithUserAuthentication(service_endpoint, user_config, compartment_id=None)#
- Parameters:
service_endpoint (str) –
user_config (OCIUserAuthenticationConfig) –
compartment_id (str | None) –
- to_dict()#
- Return type:
Dict[str, str | Dict[str, Any]]
Important
OCIClientConfigWithUserAuthentication
supports the same authentication type as OCIClientConfigWithApiKey
but without a config file.
Values in the config file are passed directly through OCIUserAuthenticationConfig
below.
- class wayflowcore.models.ociclientconfig.OCIUserAuthenticationConfig(user, key_content, fingerprint, tenancy, region)#
Create an OCI user authentication config, which can be passed to the OCIClientConfigWithUserAuthentication class in order to authenticate the OCI service.
This class provides a way to authenticate the OCI service without relying on a config file. In other words, it is equivalent to saving the config in a file and passing the file using OCIClientConfigWithApiKey class.
- Parameters:
user (str) – user OCID
key_content (str) – content of the private key
fingerprint (str) – fingerprint of your public key
tenancy (str) – tenancy OCID
region (str) – OCI region
warning:: (..) – This class contains sensitive information. Please make sure that the contents are not printed or logged.
- classmethod from_dict(client_config)#
- Parameters:
client_config (Dict[str, str]) –
- Return type:
- to_dict()#
- Return type:
Dict[str, str]
Important
The serialization of this class is currently not supported since the values are sensitive information.