How to Send Images to LLMs and Agents#

python-icon Download Python Script

Python script/notebook for this guide.

Tutorial on using Images with WayFlow models

Prerequisites

Overview#

Some Large Language Models (LLMs) can handle images in addition to text. WayFlow supports passing images alongside text in both direct prompt requests and full agent conversations using the ImageContent API.

This guide will show you:

  • How to create ImageContent in code.

  • How to run a prompt with image input directly with the model.

  • How to send image+text messages in an Agent conversation.

  • How to inspect and use model/agent outputs with image reasoning.

What is ImageContent?#

ImageContent is a type of message content that stores image bytes and format metadata. You can combine an image with additional TextContent in a single message.

Basic implementation#

First import what is needed for this guide:

1import requests
2from wayflowcore.agent import Agent
3from wayflowcore.messagelist import ImageContent, Message, TextContent
4from wayflowcore.models.llmmodel import Prompt
5

To follow this guide, you will need access to a Multimodal large language model (LLM). WayFlow supports several LLM API providers. Select an LLM from the options below:

from wayflowcore.models import OCIGenAIModel

if __name__ == "__main__":

    llm = OCIGenAIModel(
        model_id="provider.model-id",
        service_endpoint="https://url-to-service-endpoint.com",
        compartment_id="compartment-id",
        auth_type="API_KEY",
    )

Step 1: Creating a prompt with ImageContent#

Before sending requests to your vision-capable LLMs or agents, you need to construct a prompt containing both the image and text content. The example below demonstrates:

  • Downloading an image (here, the Oracle logo) via HTTP request

  • Creating an ImageContent object from the image bytes

  • Adding a TextContent question

  • Packing both into a Message, then into a Prompt

# Download the Oracle logo as PNG (publicly accessible image)
image_url = "https://www.oracle.com/a/ocom/img/oracle-logo.png"
response = requests.get(image_url)
response.raise_for_status()
image_bytes = response.content

# Create ImageContent: format must match the image (in this case: "png")
image_content = ImageContent.from_bytes(bytes_content=image_bytes, format="png")

# Compose a message with both image and question
text_content = TextContent(content="Which company's logo is this?")
user_message = Message(contents=[image_content, text_content], role="user")
prompt = Prompt(messages=[user_message])

Step 2: Sending image input to a vision-capable model#

You can send images directly to your LLM by constructing a prompt with both ImageContent and TextContent. The example below downloads the Oracle logo PNG and queries the LLM for recognition.

result = llm.generate(prompt)
print("Model output:", result.message.content)
# For the Oracle logo, output should mention "Oracle Corporation"

Expected output: The model should identify the company (e.g. “Oracle Corporation” or equivalent). If your model does not support images, you will get an error.

Step 3: Using images in Agent conversations#

You can pass images in an Agent-driven chat workflow. This allows assistants to process visual information alongside user dialog.

# Create an Agent configured for vision
agent = Agent(llm=llm)

# Start a new conversation
conversation = agent.start_conversation()

# Add a user message with both image and text as contents
conversation.append_message(Message(contents=[image_content, text_content], role="user"))

# Run agent logic for this input
conversation.execute()

# Retrieve and print the agent's last response
agent_output = conversation.get_last_message()
if agent_output is not None:
    print("Agent output:", agent_output.content)
# The output should mention "Oracle Corporation"

Expected output: The agent response should mention “Oracle Corporation”.

API Reference and Practical Information#

Supported Image Formats#

Most vision LLMs support PNG, JPG, JPEG, GIF, or WEBP. Always specify the correct format for ImageContent.

Agent Spec Exporting/Loading#

You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter.

from wayflowcore.agentspec import AgentSpecExporter

serialized_agent = AgentSpecExporter().to_json(agent)

Here is what the Agent Spec representation will look like ↓

Click here to see the assistant configuration.
{
  "component_type": "Agent",
  "id": "2fc0cb26-98db-4a53-869b-61587a784b1a",
  "name": "agent_df87a3d8",
  "description": "",
  "metadata": {
    "__metadata_info__": {
      "name": "agent_df87a3d8",
      "description": ""
    }
  },
  "inputs": [],
  "outputs": [],
  "llm_config": {
    "component_type": "VllmConfig",
    "id": "16d7437d-b510-4599-b1d4-51e8418043c4",
    "name": "GEMMA_MODEL_ID",
    "description": null,
    "metadata": {
      "__metadata_info__": {}
    },
    "default_generation_parameters": null,
    "url": "GEMMA_API_URL",
    "model_id": "GEMMA_MODEL_ID"
  },
  "system_prompt": "",
  "tools": [],
  "agentspec_version": "25.4.1"
}

You can then load the configuration back to an assistant using the AgentSpecLoader.

from wayflowcore.agentspec import AgentSpecLoader

agent = AgentSpecLoader().load_json(serialized_agent)

Next steps#

Having learned how to send images to LLMs and Agents, you may now proceed to:

Full code#

Click on the card at the top of this page to download the full code for this guide or copy the code below.

  1# Copyright © 2025 Oracle and/or its affiliates.
  2#
  3# This software is under the Universal Permissive License
  4# %%[markdown]
  5# Code Example - How to use use images in conversations
  6# -----------------------------------------------------
  7
  8# How to use:
  9# Create a new Python virtual environment and install the latest WayFlow version.
 10# ```bash
 11# python -m venv venv-wayflowcore
 12# source venv-wayflowcore/bin/activate
 13# pip install --upgrade pip
 14# pip install "wayflowcore==26.1" 
 15# ```
 16
 17# You can now run the script
 18# 1. As a Python file:
 19# ```bash
 20# python howto_imagecontent.py
 21# ```
 22# 2. As a Notebook (in VSCode):
 23# When viewing the file,
 24#  - press the keys Ctrl + Enter to run the selected cell
 25#  - or Shift + Enter to run the selected cell and move to the cell below# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl) or Apache License
 26# 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0), at your option.
 27
 28
 29# %%[markdown]
 30## Imports
 31
 32# %%
 33import requests
 34from wayflowcore.agent import Agent
 35from wayflowcore.messagelist import ImageContent, Message, TextContent
 36from wayflowcore.models.llmmodel import Prompt
 37
 38
 39# %%[markdown]
 40## Model configuration
 41
 42# %%
 43from wayflowcore.models import VllmModel
 44llm = VllmModel(
 45    model_id="GEMMA_MODEL_ID",
 46    host_port="GEMMA_API_URL",
 47)
 48
 49# %%[markdown]
 50## Create prompt
 51
 52# %%
 53# Download the Oracle logo as PNG (publicly accessible image)
 54image_url = "https://www.oracle.com/a/ocom/img/oracle-logo.png"
 55response = requests.get(image_url)
 56response.raise_for_status()
 57image_bytes = response.content
 58
 59# Create ImageContent: format must match the image (in this case: "png")
 60image_content = ImageContent.from_bytes(bytes_content=image_bytes, format="png")
 61
 62# Compose a message with both image and question
 63text_content = TextContent(content="Which company's logo is this?")
 64user_message = Message(contents=[image_content, text_content], role="user")
 65prompt = Prompt(messages=[user_message])
 66
 67# %%[markdown]
 68## Generate completion with an image as input
 69
 70# %%
 71result = llm.generate(prompt)
 72print("Model output:", result.message.content)
 73# For the Oracle logo, output should mention "Oracle Corporation"
 74
 75# %%[markdown]
 76## Pass an image to an agent as input
 77
 78# %%
 79# Create an Agent configured for vision
 80agent = Agent(llm=llm)
 81
 82# Start a new conversation
 83conversation = agent.start_conversation()
 84
 85# Add a user message with both image and text as contents
 86conversation.append_message(Message(contents=[image_content, text_content], role="user"))
 87
 88# Run agent logic for this input
 89conversation.execute()
 90
 91# Retrieve and print the agent's last response
 92agent_output = conversation.get_last_message()
 93if agent_output is not None:
 94    print("Agent output:", agent_output.content)
 95# The output should mention "Oracle Corporation"
 96
 97# %%[markdown]
 98## Export config to Agent Spec
 99
100# %%
101from wayflowcore.agentspec import AgentSpecExporter
102
103serialized_agent = AgentSpecExporter().to_json(agent)
104
105# %%[markdown]
106## Load Agent Spec config
107
108# %%
109from wayflowcore.agentspec import AgentSpecLoader
110
111agent = AgentSpecLoader().load_json(serialized_agent)