How to Send Images to LLMs and Agents#
Prerequisites
Familiarity with basic agent and prompt workflows.
Overview#
Some Large Language Models (LLMs) can handle images in addition to text. WayFlow supports passing images alongside text in both direct prompt requests and full agent conversations using the ImageContent API.
This guide will show you:
How to create ImageContent in code.
How to run a prompt with image input directly with the model.
How to send image+text messages in an Agent conversation.
How to inspect and use model/agent outputs with image reasoning.
What is ImageContent
?#
ImageContent is a type of message content that stores image bytes and format metadata. You can combine an image with additional TextContent in a single message.
Basic implementation#
First import what is needed for this guide:
1import requests
2from wayflowcore.agent import Agent
3from wayflowcore.messagelist import ImageContent, Message, TextContent
4from wayflowcore.models.llmmodel import Prompt
5
To follow this guide, you will need access to a Multimodal large language model (LLM). WayFlow supports several LLM API providers. Select an LLM from the options below:
from wayflowcore.models import OCIGenAIModel
if __name__ == "__main__":
llm = OCIGenAIModel(
model_id="provider.model-id",
service_endpoint="https://url-to-service-endpoint.com",
compartment_id="compartment-id",
auth_type="API_KEY",
)
from wayflowcore.models import VllmModel
llm = VllmModel(
model_id="model-id",
host_port="VLLM_HOST_PORT",
)
from wayflowcore.models import OllamaModel
llm = OllamaModel(
model_id="model-id",
)
Step 1: Creating a prompt with ImageContent#
Before sending requests to your vision-capable LLMs or agents, you need to construct a prompt containing both the image and text content. The example below demonstrates:
Downloading an image (here, the Oracle logo) via HTTP request
Creating an ImageContent object from the image bytes
Adding a TextContent question
Packing both into a Message, then into a Prompt
# Download the Oracle logo as PNG (publicly accessible image)
image_url = "https://www.oracle.com/a/ocom/img/oracle-logo.png"
response = requests.get(image_url)
response.raise_for_status()
image_bytes = response.content
# Create ImageContent: format must match the image (in this case: "png")
image_content = ImageContent.from_bytes(bytes_content=image_bytes, format="png")
# Compose a message with both image and question
text_content = TextContent(content="Which company's logo is this?")
user_message = Message(contents=[image_content, text_content], role="user")
prompt = Prompt(messages=[user_message])
Step 2: Sending image input to a vision-capable model#
You can send images directly to your LLM by constructing a prompt with both ImageContent and TextContent. The example below downloads the Oracle logo PNG and queries the LLM for recognition.
result = llm.generate(prompt)
print("Model output:", result.message.content)
# For the Oracle logo, output should mention "Oracle Corporation"
Expected output: The model should identify the company (e.g. “Oracle Corporation” or equivalent). If your model does not support images, you will get an error.
Step 3: Using images in Agent conversations#
You can pass images in an Agent-driven chat workflow. This allows assistants to process visual information alongside user dialog.
# Create an Agent configured for vision
agent = Agent(llm=llm)
# Start a new conversation
conversation = agent.start_conversation()
# Add a user message with both image and text as contents
conversation.append_message(Message(contents=[image_content, text_content], role="user"))
# Run agent logic for this input
conversation.execute()
# Retrieve and print the agent's last response
agent_output = conversation.get_last_message()
if agent_output is not None:
print("Agent output:", agent_output.content)
# The output should mention "Oracle Corporation"
Expected output: The agent response should mention “Oracle Corporation”.
API Reference and Practical Information#
wayflowcore.messagelist.ImageContent
wayflowcore.messagelist.TextContent
Supported Image Formats#
Most vision LLMs support PNG, JPG, JPEG, GIF, or WEBP. Always specify the correct format for ImageContent.
Agent Spec Exporting/Loading#
You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter
.
from wayflowcore.agentspec import AgentSpecExporter
serialized_agent = AgentSpecExporter().to_json(agent)
Here is what the Agent Spec representation will look like ↓
Click here to see the assistant configuration.
{
"component_type": "Agent",
"id": "2fc0cb26-98db-4a53-869b-61587a784b1a",
"name": "agent_df87a3d8",
"description": "",
"metadata": {
"__metadata_info__": {
"name": "agent_df87a3d8",
"description": ""
}
},
"inputs": [],
"outputs": [],
"llm_config": {
"component_type": "VllmConfig",
"id": "16d7437d-b510-4599-b1d4-51e8418043c4",
"name": "GEMMA_MODEL_ID",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"default_generation_parameters": null,
"url": "GEMMA_API_URL",
"model_id": "GEMMA_MODEL_ID"
},
"system_prompt": "",
"tools": [],
"agentspec_version": "25.4.1"
}
component_type: Agent
id: 2fc0cb26-98db-4a53-869b-61587a784b1a
name: agent_df87a3d8
description: ''
metadata:
__metadata_info__:
name: agent_df87a3d8
description: ''
inputs: []
outputs: []
llm_config:
component_type: VllmConfig
id: 16d7437d-b510-4599-b1d4-51e8418043c4
name: GEMMA_MODEL_ID
description: null
metadata:
__metadata_info__: {}
default_generation_parameters: null
url: GEMMA_API_URL
model_id: GEMMA_MODEL_ID
system_prompt: ''
tools: []
agentspec_version: 25.4.1
You can then load the configuration back to an assistant using the AgentSpecLoader
.
from wayflowcore.agentspec import AgentSpecLoader
agent = AgentSpecLoader().load_json(serialized_agent)
Next steps#
Having learned how to send images to LLMs and Agents, you may now proceed to:
Full code#
Click on the card at the top of this page to download the full code for this guide or copy the code below.
1# Copyright © 2025 Oracle and/or its affiliates.
2#
3# This software is under the Universal Permissive License
4# %%[markdown]
5# Code Example - How to use use images in conversations
6# -----------------------------------------------------
7
8# How to use:
9# Create a new Python virtual environment and install the latest WayFlow version.
10# ```bash
11# python -m venv venv-wayflowcore
12# source venv-wayflowcore/bin/activate
13# pip install --upgrade pip
14# pip install "wayflowcore==26.1"
15# ```
16
17# You can now run the script
18# 1. As a Python file:
19# ```bash
20# python howto_imagecontent.py
21# ```
22# 2. As a Notebook (in VSCode):
23# When viewing the file,
24# - press the keys Ctrl + Enter to run the selected cell
25# - or Shift + Enter to run the selected cell and move to the cell below# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl) or Apache License
26# 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0), at your option.
27
28
29# %%[markdown]
30## Imports
31
32# %%
33import requests
34from wayflowcore.agent import Agent
35from wayflowcore.messagelist import ImageContent, Message, TextContent
36from wayflowcore.models.llmmodel import Prompt
37
38
39# %%[markdown]
40## Model configuration
41
42# %%
43from wayflowcore.models import VllmModel
44llm = VllmModel(
45 model_id="GEMMA_MODEL_ID",
46 host_port="GEMMA_API_URL",
47)
48
49# %%[markdown]
50## Create prompt
51
52# %%
53# Download the Oracle logo as PNG (publicly accessible image)
54image_url = "https://www.oracle.com/a/ocom/img/oracle-logo.png"
55response = requests.get(image_url)
56response.raise_for_status()
57image_bytes = response.content
58
59# Create ImageContent: format must match the image (in this case: "png")
60image_content = ImageContent.from_bytes(bytes_content=image_bytes, format="png")
61
62# Compose a message with both image and question
63text_content = TextContent(content="Which company's logo is this?")
64user_message = Message(contents=[image_content, text_content], role="user")
65prompt = Prompt(messages=[user_message])
66
67# %%[markdown]
68## Generate completion with an image as input
69
70# %%
71result = llm.generate(prompt)
72print("Model output:", result.message.content)
73# For the Oracle logo, output should mention "Oracle Corporation"
74
75# %%[markdown]
76## Pass an image to an agent as input
77
78# %%
79# Create an Agent configured for vision
80agent = Agent(llm=llm)
81
82# Start a new conversation
83conversation = agent.start_conversation()
84
85# Add a user message with both image and text as contents
86conversation.append_message(Message(contents=[image_content, text_content], role="user"))
87
88# Run agent logic for this input
89conversation.execute()
90
91# Retrieve and print the agent's last response
92agent_output = conversation.get_last_message()
93if agent_output is not None:
94 print("Agent output:", agent_output.content)
95# The output should mention "Oracle Corporation"
96
97# %%[markdown]
98## Export config to Agent Spec
99
100# %%
101from wayflowcore.agentspec import AgentSpecExporter
102
103serialized_agent = AgentSpecExporter().to_json(agent)
104
105# %%[markdown]
106## Load Agent Spec config
107
108# %%
109from wayflowcore.agentspec import AgentSpecLoader
110
111agent = AgentSpecLoader().load_json(serialized_agent)