How to Use LLMs from Different LLM Providers#

Agent Spec supports several LLM providers. LlmConfig can be used directly with the api_provider field to describe any provider, or you can use a dedicated subclass for provider-specific configuration. The available LLM configurations are:

Their configuration is specified directly in their respective class constructor. This guide will show you how to configure LLMs from different LLM providers with examples and notes on usage.

Configure retry behavior for remote LLM calls#

All LlmConfig subclasses accept an optional retry_policy parameter. Use it to configure retry attempts, per-request timeouts, and backoff behavior for transient failures when calling remote LLM endpoints.

For example, you can attach a retry policy directly to a VllmConfig:

from pyagentspec import RetryPolicy
from pyagentspec.llms import LlmGenerationConfig, VllmConfig

retry_policy = RetryPolicy(
    max_attempts=4,
    request_timeout=30.0,
    initial_retry_delay=0.5,
    max_retry_delay=8.0,
)

llm_config_with_retry_policy = VllmConfig(
    name="vllm-llama-4-maverick-with-retries",
    model_id="llama-4-maverick",
    url="http://url.to.my.vllm.server/llama4mav",
    default_generation_parameters=LlmGenerationConfig(
        max_tokens=512, temperature=1.0, top_p=1.0
    ),
    retry_policy=retry_policy,
)

API Reference: LlmConfig, RetryPolicy

LlmConfig (Generic)#

LlmConfig can be used directly to describe any LLM without requiring a provider-specific subclass. This is useful when you want to describe an LLM from a provider that does not have a dedicated configuration class, or when you want a simple, portable configuration.

Parameters

model_id: str#: Identifier of the model to use, as expected by the selected API provider.

provider: str, null#: The model provider, i.e. who made the model (e.g. "openai", "meta", "anthropic", "cohere").

api_provider: str, null#: The API provider, i.e. who serves the API (e.g. "openai", "oci", "vllm", "ollama", "aws_bedrock", "vertex_ai").

api_type: str, null#: The API format to use to interact with the LLM (e.g. "chat_completions", "responses").

url: str, null#: URL of the API endpoint (e.g. "https://api.openai.com/v1").

api_key: str, null#: An optional API key for the remote LLM. When exported, the value is replaced by a reference.

default_generation_parameters: dict, null#: Default parameters for text generation with this model.

Examples

from pyagentspec.llms import LlmConfig
from pyagentspec.llms import LlmGenerationConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.7)

llm = LlmConfig(
    name="openai-gpt4o",
    model_id="gpt-4o",
    provider="openai",
    api_provider="openai",
    api_type="chat_completions",
    default_generation_parameters=generation_config,
)

OciGenAiConfig#

OCI GenAI Configuration refers to model served by OCI Generative AI.

Parameters

model_id: str#: Name of the model to use. A list of the available models is given in Oracle OCI Documentation under the Model Retirement Dates (On-Demand Mode) section.

compartment_id: str#: The OCID (Oracle Cloud Identifier) of a compartment within your tenancy.

serving_mode: str#

The mode how the model specified is served:

ON_DEMAND: the model is hosted in a shared environment;
DEDICATED: the model is deployed in a customer-dedicated environment.

default_generation_parameters: dict, null#

Default parameters for text generation with this model.

Example:

default_generation_parameters = LlmGenerationConfig(max_tokens=256, temperature=0.8)

client_config: OciClientConfig, null#: OCI client config to authenticate the OCI service. See the below examples for the usage and more information.

OCI Client Configuration#

OCI GenAI models require a client configuration that contains all the settings needed to perform the authentication to use OCI services. The OciClientConfig holds these settings.

Parameters

service_endpoint: str#: The endpoint URL for the OCIGenAI service. Make sure you set the region right. For doing so, make sure that the Region where your private key is created, is aligned with the region mention in the service_endpoint.

auth_type: str#: The authentication type to use, e.g., API_KEY, SECURITY_TOKEN, INSTANCE_PRINCIPAL (It means that you need to execute the code from a compartment enabled for OCIGenAI.), RESOURCE_PRINCIPAL.

Based on the type of authentication the user wants to adopt, different specifications of the OciClientConfig are defined. Indeed, the OciClientConfig component is abstract, and should not be used directly. In the following sections we show what client extensions are available and their specific parameters.

Examples

from pyagentspec.llms import OciGenAiConfig
from pyagentspec.llms import LlmGenerationConfig
from pyagentspec.llms.ociclientconfig import OciClientConfigWithApiKey

# Get the list of available models from:
# https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating.htm#
# under the "Model Retirement Dates (On-Demand Mode)" section.
OCIGENAI_MODEL_ID = "xai.grok-3"
# Typical service endpoint for OCI GenAI service inference
# <oci region> can be "us-chicago-1" and can also be found in your ~/.oci/config file
OCIGENAI_ENDPOINT = "https://inference.generativeai.<oci region>.oci.oraclecloud.com"
# <compartment_id> can be obtained from your personal OCI account (not the key config file).
# Please find it under "Identity > Compartments" on the OCI console website after logging in to your user account.
COMPARTMENT_ID = "ocid1.compartment.oc1..<compartment_id>"

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.8)

llm = OciGenAiConfig(
    name="oci-genai-grok3",
    model_id=OCIGENAI_MODEL_ID,
    compartment_id=COMPARTMENT_ID,
    client_config=OciClientConfigWithApiKey(
        name="client_config",
        service_endpoint=OCIGENAI_ENDPOINT,
        auth_file_location="~/.oci/config",
        auth_profile="DEFAULT",
    ),
    default_generation_parameters=generation_config,
)

OciClientConfigWithSecurityToken#

Client configuration that should be used if users want to use authentication through security token.

Parameters

auth_file_location: str#: The location of the authentication file from which the authentication information should be retrieved. The default location is ~/.oci/config.

auth_profile: str#: The name of the profile to use, among the ones defined in the authentication file. The default profile name is DEFAULT.

OciClientConfigWithApiKey#

Client configuration that should be used if users want to use authentication with API key. The parameters required are the same defined for the OciClientConfigWithSecurityToken.

OciClientConfigWithInstancePrincipal#

Client configuration that should be used if users want to use instance principal authentication. No additional parameters are required.

OciClientConfigWithResourcePrincipal#

Client configuration that should be used if users want to use resource principal authentication. No additional parameters are required.

OpenAiConfig#

OpenAI Models are powered by OpenAI. You can refer to one of those models by using the OpenAiConfig Component.

Parameters

model_id: str#: Name of the model to use.

api_type: str#: The API type that should be used. Can be either chat_completions or responses.

api_key: str, null#: An optional api key for the authentication with the OpenAI endpoint.

default_generation_parameters: dict, null#: Default parameters for text generation with this model.

Important

Ensure that the OPENAI_API_KEY is set beforehand to access this model. A list of available OpenAI models can be found at the following link: OpenAI Models.

Examples

from pyagentspec.llms import OpenAiConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.7, top_p=0.9)

llm = OpenAiConfig(
    name="openai-gpt-5",
    model_id="gpt-5",
    default_generation_parameters=generation_config,
    api_key="optional_api_key",
)

GeminiConfig#

Gemini models can be configured through GeminiConfig. Agent Spec supports both Google AI Studio and Google Vertex AI authentication modes.

Gemini authentication is modeled as a nested auth component, similar to OCI client_config. The auth component itself remains inline during serialization. When api_key or credentials is provided explicitly, only that sensitive field is externalized and must be supplied through components_registry when loading the configuration back.

Parameters

model_id: str#: Name of the model to use, for example gemini-2.5-flash or gemini-2.0-flash-lite.

auth: GeminiAuthConfig#: Required authentication component for Gemini. As with other Agent Spec components, auth configs need a name. Use GeminiAIStudioAuthConfig(name="gemini-aistudio-auth") if you want runtimes to load GEMINI_API_KEY from the environment, or GeminiVertexAIAuthConfig(name="gemini-vertex-auth", ...) for Vertex AI. The auth component remains inline when serialized. If api_key or credentials is set explicitly, only that sensitive field is serialized as a reference.

default_generation_parameters: dict, null#: Default parameters for text generation with this model.

Google AI Studio authentication#

Use GeminiAIStudioAuthConfig when connecting through Google AI Studio.

Parameters

api_key: str, null#: Optional Gemini API key. If omitted, runtimes may load it from GEMINI_API_KEY. If provided explicitly, only the api_key field is externalized during serialization and must be supplied separately when deserializing.

Example

from pyagentspec.llms import GeminiConfig
from pyagentspec.llms.geminiauthconfig import GeminiAIStudioAuthConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.7, top_p=0.9)

llm = GeminiConfig(
    name="gemini-aistudio-flash",
    model_id="gemini-2.5-flash",
    auth=GeminiAIStudioAuthConfig(
        name="gemini-aistudio-auth"
        # Optional: if api_key is omitted, runtimes may load GEMINI_API_KEY from the environment.
    ),
    default_generation_parameters=generation_config,
)

Vertex AI authentication#

Use GeminiVertexAIAuthConfig when connecting through Google Vertex AI.

Parameters

project_id: str, null#: Optional Google Cloud project identifier. In practice, you may still need to set this explicitly when ADC provides credentials but does not expose a default project.

location: str#: Vertex AI location or region. Defaults to global.

credentials: str | dict, null#: Optional local file path (str) to a Google Cloud JSON credential file, such as a service-account key file, or an inline dict containing the parsed JSON contents of that file. When omitted, runtimes may rely on Google Application Default Credentials (ADC), such as GOOGLE_APPLICATION_CREDENTIALS, credentials made available through the local Google Cloud environment, or an attached service account. See Google Cloud authentication docs for details. This does not guarantee that project_id can also be inferred automatically. If provided explicitly, only the credentials field is externalized during serialization. Non-secret auth settings such as project_id and location remain inline in the main config.

Example

from pyagentspec.llms.geminiauthconfig import GeminiVertexAIAuthConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.4, top_p=0.95)

llm = GeminiConfig(
    name="gemini-vertex-flash",
    model_id="gemini-2.0-flash-lite",
    auth=GeminiVertexAIAuthConfig(
        name="gemini-vertex-auth",
        # Often still required even when ADC supplies the credentials.
        project_id="my-gcp-project",
        location="global",
        # Optional: explicit credentials can be provided when ADC is not available.
    ),
    default_generation_parameters=generation_config,
)

OpenAiCompatibleConfig#

OpenAI Compatible LLMs are all those models that are served through OpenAI APIs, either responses or completions. The OpenAiCompatibleConfig allows users to use this type of models in their agents and flows.

Parameters

model_id: str#: Name of the model to use.

url: str#: Hostname and port of the server exposing the OpenAI-compatible endpoint.

api_type: str#: The API type that should be used. Can be either chat_completions or responses.

api_key: str, null#: An optional api key if the remote server requires it.

key_file: str, null#: Path to an optional client private key file in PEM format.

cert_file: str, null#: Path to an optional client certificate chain file in PEM format.

ca_file: str, null#: Path to an optional trusted CA certificate file in PEM format, used to verify the server.

default_generation_parameters: dict, null#: Default parameters for text generation with this model.

The certificate fields are useful when the remote endpoint is exposed over HTTPS with a private CA or when it requires mutual TLS (mTLS). Like api_key, these values are treated as sensitive fields during serialization.

Examples

from pyagentspec.llms import OpenAiCompatibleConfig
from pyagentspec.llms.openaicompatibleconfig import OpenAIAPIType

generation_config = LlmGenerationConfig(max_tokens=512, temperature=1.0, top_p=1.0)

llm = OpenAiCompatibleConfig(
    name="openai-compatible-llama-4-maverick",
    model_id="llama-4-maverick",
    url="https://url.to.my.openai.compatible.server/llama4mav",
    api_type=OpenAIAPIType.RESPONSES,
    api_key="optional_api_key",
    key_file="/path/to/client.key",
    cert_file="/path/to/client.pem",
    ca_file="/path/to/ca.pem",
    default_generation_parameters=generation_config,
)

VllmConfig#

vLLM Models are models hosted with a vLLM server. The VllmConfig allows users to use this type of models in their agents and flows.

Parameters

model_id: str#: Name of the model to use.

url: str#: Hostname and port of the vLLM server where the model is hosted.

api_type: str#: The API type that should be used. Can be either chat_completions or responses.

default_generation_parameters: dict, null#: Default parameters for text generation with this model.

api_key: str, null#: An optional api key if the remote vllm server requires it.

The VllmConfig inherits from OpenAiCompatibleConfig, so it also supports the optional key_file, cert_file, and ca_file parameters for HTTPS and mTLS connections.

Examples

from pyagentspec.llms import VllmConfig

generation_config = LlmGenerationConfig(max_tokens=512, temperature=1.0, top_p=1.0)

llm = VllmConfig(
    name="vllm-llama-4-maverick",
    model_id="llama-4-maverick",
    url="http://url.to.my.vllm.server/llama4mav",
    default_generation_parameters=generation_config,
    api_key="optional_api_key",
)

OllamaConfig#

Ollama Models are powered by a locally hosted Ollama server. The OllamaConfig allows users to use this type of models in their agents and flows.

Parameters

model_id: str#: Name of the model to use.

url: str#: Hostname and port of the vLLM server where the model is hosted.

api_type: str#: The API type that should be used. Can be either chat_completions or responses.

default_generation_parameters: dict, null#: Default parameters for text generation with this model.

api_key: str, null#: An optional api key if the ollama server requires it.

Examples

from pyagentspec.llms import OllamaConfig

generation_config = LlmGenerationConfig(max_tokens=512, temperature=0.9, top_p=0.9)

llm = OllamaConfig(
    name="ollama-llama-4",
    model_id="llama-4-maverick",
    url="http://url.to.my.ollama.server/llama4mav",
    default_generation_parameters=generation_config,
    api_key="optional_api_key",
)

Recap#

This guide provides detailed descriptions of each model type supported by Agent Spec, demonstrating how to declare them using PyAgentSpec syntax.

Below is the complete code from this guide.

from pyagentspec.llms import LlmConfig
from pyagentspec.llms import LlmGenerationConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.7)

llm = LlmConfig(
    name="openai-gpt4o",
    model_id="gpt-4o",
    provider="openai",
    api_provider="openai",
    api_type="chat_completions",
    default_generation_parameters=generation_config,
)

from pyagentspec.llms import OciGenAiConfig
from pyagentspec.llms import LlmGenerationConfig
from pyagentspec.llms.ociclientconfig import OciClientConfigWithApiKey

# Get the list of available models from:
# https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating.htm#
# under the "Model Retirement Dates (On-Demand Mode)" section.
OCIGENAI_MODEL_ID = "xai.grok-3"
# Typical service endpoint for OCI GenAI service inference
# <oci region> can be "us-chicago-1" and can also be found in your ~/.oci/config file
OCIGENAI_ENDPOINT = "https://inference.generativeai.<oci region>.oci.oraclecloud.com"
# <compartment_id> can be obtained from your personal OCI account (not the key config file).
# Please find it under "Identity > Compartments" on the OCI console website after logging in to your user account.
COMPARTMENT_ID = "ocid1.compartment.oc1..<compartment_id>"

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.8)

llm = OciGenAiConfig(
    name="oci-genai-grok3",
    model_id=OCIGENAI_MODEL_ID,
    compartment_id=COMPARTMENT_ID,
    client_config=OciClientConfigWithApiKey(
        name="client_config",
        service_endpoint=OCIGENAI_ENDPOINT,
        auth_file_location="~/.oci/config",
        auth_profile="DEFAULT",
    ),
    default_generation_parameters=generation_config,
)

from pyagentspec.llms import OpenAiCompatibleConfig
from pyagentspec.llms.openaicompatibleconfig import OpenAIAPIType

generation_config = LlmGenerationConfig(max_tokens=512, temperature=1.0, top_p=1.0)

llm = OpenAiCompatibleConfig(
    name="openai-compatible-llama-4-maverick",
    model_id="llama-4-maverick",
    url="https://url.to.my.openai.compatible.server/llama4mav",
    api_type=OpenAIAPIType.RESPONSES,
    api_key="optional_api_key",
    key_file="/path/to/client.key",
    cert_file="/path/to/client.pem",
    ca_file="/path/to/ca.pem",
    default_generation_parameters=generation_config,
)

from pyagentspec.llms import VllmConfig

generation_config = LlmGenerationConfig(max_tokens=512, temperature=1.0, top_p=1.0)

llm = VllmConfig(
    name="vllm-llama-4-maverick",
    model_id="llama-4-maverick",
    url="http://url.to.my.vllm.server/llama4mav",
    default_generation_parameters=generation_config,
)

from pyagentspec.llms import OpenAiConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.7, top_p=0.9)

llm = OpenAiConfig(
    name="openai-gpt-5",
    model_id="gpt-5",
    default_generation_parameters=generation_config,
)

from pyagentspec.llms import GeminiConfig
from pyagentspec.llms.geminiauthconfig import GeminiAIStudioAuthConfig, GeminiVertexAIAuthConfig

generation_config = LlmGenerationConfig(max_tokens=256, temperature=0.7, top_p=0.9)

llm = GeminiConfig(
    name="gemini-aistudio-flash",
    model_id="gemini-2.5-flash",
    auth=GeminiAIStudioAuthConfig(
        name="gemini-aistudio-auth"
        # Optional: if api_key is omitted, runtimes may load GEMINI_API_KEY from the environment.
    ),
    default_generation_parameters=generation_config,
)

llm = GeminiConfig(
    name="gemini-vertex-flash",
    model_id="gemini-2.0-flash-lite",
    auth=GeminiVertexAIAuthConfig(
        name="gemini-vertex-auth",
        # Often still required even when ADC supplies the credentials.
        project_id="my-gcp-project",
        location="global",
        # Optional: explicit credentials can be provided when ADC is not available.
    ),
    default_generation_parameters=generation_config,
)

from pyagentspec.llms import OllamaConfig

generation_config = LlmGenerationConfig(max_tokens=512, temperature=0.9, top_p=0.9)

llm = OllamaConfig(
    name="ollama-llama-4",
    model_id="llama-4-maverick",
    url="http://url.to.my.ollama.server/llama4mav",
    default_generation_parameters=generation_config
)

Next steps#

Having learned how to configure LLMs from different providers, you may now proceed to: