How to Build RAG-Powered Assistants#
Retrieval-Augmented Generation (RAG) is a powerful technique that enhances AI assistants by connecting them to external knowledge sources. Instead of relying solely on their training data, RAG-enabled assistants can search through your specific documents, databases, or knowledge bases to provide accurate, up-to-date, and contextually relevant responses.
In this tutorial, you will:
Configure vector search to enable semantic similarity matching in your data.
Create a searchable datastore with embeddings for efficient retrieval.
Create a RAG-powered Agent that autonomously searches for information to fulfill user requests.
Build a RAG-powered Flow using SearchStep for structured retrieval workflows.
Control which fields are used for embeddings to optimize search relevance.
This tutorial demonstrates RAG using Oracle Database as a persistent, production-ready vector store. You will use OracleDatabaseDatastore and Oracle AI Vector Search throughout.
Concepts shown in this guide#
VectorRetrieverConfig and SearchConfig for configuring vector search
SearchToolBox for providing search capabilities to Agents
SearchStep for retrieval in Flows
VectorConfig and SerializerConfig for controlling embedding generation
Embedding models for converting text to vectors
Before you begin, you must connect to Oracle Database and create the table(s) needed for vector search. See below.
Step 0. OracleDatabaseDatastore: Connecting and Automated Table Preparation#
To use this guide, you should prepare an Oracle Database with vector search capability. This tutorial demonstrates an example for how you can automate the connection and table setup directly from Python. To follow this guide, you just need to have a connection to Oracle Database and should be able to perform operations on the Database.
Connection & Authentication
The code automatically detects either mTLS or simple TLS database connectivity using environment variables. The following environment variables must be set for your Oracle connection:
# For mTLS connection (Autonomous DB/Wallet)
export ADB_CONFIG_DIR=encrypted/wallet/config
export ADB_WALLET_DIR=encrypted/wallet
export ADB_WALLET_SECRET='supersecret'
export ADB_DB_USER=garage_user
export ADB_DB_PASSWORD=secret
export ADB_DSN="adb....oraclecloud.com"
# Or for TLS connection
export ADB_DB_USER=garage_user
export ADB_DB_PASSWORD=secret
export ADB_DSN="dbhost:port/servicename"
Reference: Oracle Database TLS setup guide
Warning
Using environment variables for storing sensitive connection details is not suitable for production environments.
The code will choose the most secure available connection automatically.
Table Schema Setup and DDL Execution
To be able to retrieve from you data, you need it stored in a database. We can use the Oracle Database to store the entities that will be retrieved using Oracle 23AI.
To connect to it, configure the client with oracledb and specify the schema of the data.
The schema for this example is:
CREATE TABLE motorcycles (
owner_name VARCHAR2(255),
model_name VARCHAR2(255),
description VARCHAR2(255),
hp INTEGER,
serialized_text VARCHAR2(1023),
embeddings VECTOR
);
This schema includes both a conventional text representation (serialized_text) and a VECTOR column for semantic search. The Python code handles both the creation (and dropping) of this table and the population of its data.
1import os
2import oracledb
3
4from wayflowcore.datastore.oracle import MTlsOracleDatabaseConnectionConfig, TlsOracleDatabaseConnectionConfig
5
6def environment_config():
7 mtls_vars = (
8 "ADB_CONFIG_DIR",
9 "ADB_WALLET_DIR",
10 "ADB_WALLET_SECRET",
11 "ADB_DB_USER",
12 "ADB_DB_PASSWORD",
13 "ADB_DSN",
14 )
15 tls_vars = ("ADB_DB_USER", "ADB_DB_PASSWORD", "ADB_DSN")
16 if all(v in os.environ for v in mtls_vars):
17 return MTlsOracleDatabaseConnectionConfig(
18 config_dir=os.environ["ADB_CONFIG_DIR"],
19 wallet_location=os.environ["ADB_WALLET_DIR"],
20 wallet_password=os.environ["ADB_WALLET_SECRET"],
21 user=os.environ["ADB_DB_USER"],
22 password=os.environ["ADB_DB_PASSWORD"],
23 dsn=os.environ["ADB_DSN"],
24 id="oracle_datastore_connection_config",
25 )
26 if all(v in os.environ for v in tls_vars):
27 return TlsOracleDatabaseConnectionConfig(
28 user=os.environ["ADB_DB_USER"],
29 password=os.environ["ADB_DB_PASSWORD"],
30 dsn=os.environ["ADB_DSN"],
31 id="oracle_datastore_connection_config",
32 )
33 raise Exception("Required OracleDB environment variables not found")
34
35
36connection_config = environment_config()
37
38ORACLE_DB_DDL = """
39 CREATE TABLE motorcycles (
40 owner_name VARCHAR2(255),
41 model_name VARCHAR2(255),
42 description VARCHAR2(255),
43 hp INTEGER,
44 serialized_text VARCHAR2(1023),
45 embeddings VECTOR
46)"""
47
48with connection_config.get_connection() as conn:
49 with conn.cursor() as cursor:
50 try:
51 cursor.execute(ORACLE_DB_DDL)
52 except oracledb.DatabaseError as e:
53 print(f"DDL execution warning: {e}")
The code:
Detects your connection configuration (mTLS/TLS)
- Creates the target table in Oracle using your credentials with:
the table fields to match the entity schema (see next section)
an additional serial_text TEXT field to store the string used for embeddings
an additional embeddings VECTOR field for the vector search
Note that if you already have a table configured with the same name, you will need to drop the table before running this code. Refer to the Cleaning Up section to see how this can be done.
Required Privileges
Make sure your user has privileges to drop, create, insert, update, and select on the target table (motorcycles).
Also make sure you have installed oracledb.
Note
You can install the required package using pip:
pip install oracledb
Setting Up RAG#
A Retrieval-Augmented Generation (RAG) system is composed of two core components: a retriever and an LLM (Large Language Model). The retriever is responsible for searching your data for relevant information, while the LLM uses the retriever as a tool to supplement its responses with up-to-date knowledge. To achieve this, we would thus need both a retriever with an embedding model and an LLM to perform end-to-end RAG.
Before creating these RAG-powered assistants, you will need to set up the data source which supports vector search capabilities.
Step 1. Configure models#
You need an embedding model for the retriever as it converts your text data into embeddings (vector representations). The retriever uses these embeddings to perform semantic searches, enabling the system to retrieve relevant information based on meaning rather than just keywords.
Configure the embedding model for vector search:
1from wayflowcore.embeddingmodels import VllmEmbeddingModel
2# Configure embedding model for vector search
3embedding_model = VllmEmbeddingModel(base_url="EMBEDDING_API_URL", model_id="model-id")
Configure your LLM:
The LLM (Large Language Model) plays two crucial roles in RAG. First, it generates a suitable retrieval query to fetch relevant information from your datastore. Then, after retrieval, the LLM formats and integrates the retrieved text into a coherent, user-facing response. Understanding the role of the LLM is key to grasping why RAG involves both retrieval and generative capabilities: retrieval brings in up-to-date, domain-specific knowledge, while the LLM ensures information is expressed in conversational form for the user.
from wayflowcore.models import OCIGenAIModel, OCIClientConfigWithApiKey
llm = OCIGenAIModel(
model_id="provider.model-id",
compartment_id="compartment-id",
client_config=OCIClientConfigWithApiKey(
service_endpoint="https://url-to-service-endpoint.com",
),
)
from wayflowcore.models import VllmModel
llm = VllmModel(
model_id="model-id",
host_port="VLLM_HOST_PORT",
)
from wayflowcore.models import OllamaModel
llm = OllamaModel(
model_id="model-id",
)
Step 2. Define searchable data#
First, define the schema for your data. Note that the collection and property names defined below should match the table and column names configured in Oracle Database (see the table we created in step 0).
1# Define the motorcycle entity schema
2from wayflowcore.datastore import Entity
3from wayflowcore.property import IntegerProperty, StringProperty, VectorProperty
4
5motorcycles = Entity(
6 description="Motorcycles in our garage",
7 properties={
8 "owner_name": StringProperty(description="Name of the motorcycle owner"),
9 "model_name": StringProperty(description="Motorcycle model and brand"),
10 "description": StringProperty(description="Detailed description of the motorcycle"),
11 "hp": IntegerProperty(description="Horsepower of the motorcycle"),
12 "serialized_text": StringProperty(description="Concatenated string of all columns"),
13 "embeddings": VectorProperty(description="Generated embeddings for serialized_text"),
14 },
15)
Next, we configure a vector and search config for searching in this data. A few things to note:
If you have configured a vector index, ensure you put the same distance metric in the
distance_metricparameter of the VectorRetrieverConfig. Without doing so, the approximate search will not work.The embedding model passed in either the VectorConfig or the VectorRetrieverConfig should be the same as the model used to generate the corresponding embeddings column. If you specify an embedding model in both the classes, the embedding models must match.
You can configure the
vectorsparameter in the VectorRetrieverConfig to explicitly specify the vector column or VectorConfig you want to search.If
vectorsis None, the vector column to search will be inferred by either an existing vector config with the same collection name or a vector column in the collection. If there are two or more matching vector configurations, an error will be raised.If you do not specify a
collection_namein the VectorConfig, the config is applicable to all collections in your datastore.
1from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
2
3# Configure Vector Config for Search
4vector_config = VectorConfig(
5 model=embedding_model,
6 collection_name="motorcycles",
7 vector_property="embeddings"
8)
9
10# Configure vector search for semantic similarity matching
11search_config = SearchConfig(
12 name="motorcycle_search",
13 retriever=VectorRetrieverConfig(
14 model=embedding_model,
15 collection_name="motorcycles",
16 distance_metric="cosine_distance",
17 ),
18)
Then, you can create the datastore with search capability by passing the search configuration. To fill the data, we perform serialization of fields using the ConcatSerializerConfig.
For each motorcycle entity, this will concatenate all fields and their values into a single string called serialized_text, which is then embedded by the model. The resulting embedding vector is assigned to the embeddings field.
This approach gives you control over what text is represented in your vector index and is transparent/easy to audit.
By default, all text fields in your entities are used to generate embeddings. However, you may want to exclude certain fields like IDs, prices, or metadata from the embedding calculation while still returning them in search results.
This can be achieved by configuring the columns_to_exclude parameter in ConcatSerializerConfig.
Note that datastore.create() is used here for demonstration only and is not the recommended way to load data into Oracle Database tables.
For real applications, populate your tables with SQL (e.g., bulk INSERT/UPDATE), then use the Datastore APIs to index, search, and take advantage of WayFlow features.
1from wayflowcore.datastore import OracleDatabaseDatastore
2from wayflowcore.search.config import ConcatSerializerConfig
3
4# Create Oracle Database datastore with vector search capability
5datastore = OracleDatabaseDatastore(
6 connection_config=connection_config,
7 schema={"motorcycles": motorcycles},
8 search_configs=[search_config],
9 vector_configs=[vector_config],
10)
11
12# Sample motorcycle data
13motorcycle_data = [
14 {
15 "owner_name": "John Smith",
16 "model_name": "Galaxion Thunderchief",
17 "hp": 87,
18 "description": "Classic American touring motorcycle with chrome details and comfortable seating.",
19 },
20 {
21 "owner_name": "Sarah Johnson",
22 "model_name": "Starlite Apex-R7",
23 "hp": 118,
24 "description": "High-performance supersport motorcycle designed for track racing.",
25 },
26 {
27 "owner_name": "Mike Chen",
28 "model_name": "Orion CX 1300 Helix",
29 "hp": 136,
30 "description": "Premium adventure touring motorcycle with advanced electronics.",
31 },
32 {
33 "owner_name": "Emily Davis",
34 "model_name": "Nebula Trailrunner 500",
35 "hp": 45,
36 "description": "Street-legal dirt bike perfect for off-road adventures.",
37 },
38 {
39 "owner_name": "Carlos Rodriguez",
40 "model_name": "Vortex Momentum X1",
41 "hp": 214,
42 "description": "Italian superbike with MotoGP-derived technology and stunning performance.",
43 },
44]
45# Configure Serializer to serialize columns into a string
46serializer = ConcatSerializerConfig()
47# Generate serialized_text and embeddings
48for entity in motorcycle_data:
49 entity["serialized_text"] = serializer.serialize(entity)
50 entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
51
52# Populate the OracleDB datastore
53datastore.create(collection_name="motorcycles", entities=motorcycle_data)
Create a Vector Index for Efficient Vector Search
For production semantic search, it is also recommended that you create a vector index on the embeddings field using Oracle’s HNSW (or IVF) index. Having a vector index configured is not necessary for search to work, but it will speed things up as it will use approximate search rather than using exact search. Note that if you want to use the vector index as intended, the distance metric configured in the index should be the same as the distance metric used in the VectorRetrieverConfig (you can use SimilarityMetric for simplicity). The code below creates this index programmatically and commits it to your Oracle DB. (Skip this step for in-memory datastores.)
1import oracledb
2
3# Configure Vector Index
4VECTOR_INDEX_DDL = """
5 CREATE VECTOR INDEX hnsw_image
6 ON motorcycles (embeddings)
7 ORGANIZATION INMEMORY NEIGHBOR GRAPH
8 DISTANCE COSINE
9 WITH TARGET ACCURACY 95;
10"""
11with connection_config.get_connection() as connection:
12 with connection.cursor() as cursor:
13 try:
14 cursor.execute(VECTOR_INDEX_DDL)
15 connection.commit()
16 except oracledb.DatabaseError as e:
17 print(f"Vector Index Creation warning: {e}")
You can test the search directly:
1# Example of direct vector search
2results = datastore.search(
3 collection_name="motorcycles", query="high performance sport bike for racing", k=3
4)
5
6print("Direct search results:")
7for result in results:
8 print(f"- {result['model_name']}")
9
10# Direct search results:
11# - Starlite Apex-R7
12# - Vortex Momentum X1
13# - Nebula Trailrunner 500
With your RAG-ready datastore in place, the next step is to use it in real applications. In WayFlow, the two primary patterns for Retrieval-Augmented Generation are:
Integrating RAG capabilities into conversational Agents for dynamic, dialogue-driven retrieval
Building Flows for more structured and predictable retrieval workflows.
In the next sections, you’ll see hands-on how to use both approaches, starting with Agents.
RAG in Agents#
We’ll start by showing how to empower your Agents with retrieval capabilities, allowing them to proactively fetch and reason over domain-specific information as part of their decision-making. Agents provide a flexible approach to RAG by autonomously deciding when and how to search for information based on the conversation context.
Step 1. Create search tools for the Agent#
Convert your searchable datastore into tools that an Agent can use:
1# Create search tools for the agent
2search_toolbox = datastore.get_search_toolbox(k=3)
The get_search_tools method creates a SearchToolBox that:
Dynamically generates search tools for each collection
Respects the
kparameter to limit result countReturns results as JSON for easy parsing by the LLM
Step 2. Create the RAG Agent#
Create an Agent with search capabilities:
1from textwrap import dedent
2from wayflowcore.agent import Agent
3
4# Create RAG-powered agent
5rag_agent = Agent(
6 tools=search_toolbox.get_tools(),
7 llm=llm,
8 custom_instruction=dedent(
9 """
10 You are a helpful motorcycle garage assistant with access to our motorcycle database.
11
12 IMPORTANT:
13 - Always search for relevant information before answering questions about motorcycles
14 - Base your answers on the search results
15 - If you can't find relevant information, say so clearly
16 - Be specific and mention details from the search results
17
18 You have access to search tools that can find information about:
19 - Motorcycle models and specifications
20 - Owners of motorcycles
21 - Horsepower and performance details
22 - Descriptions and features
23 """
24 ),
25 initial_message="Hello! I'm your RAG-powered motorcycle assistant. I can search our database to answer your questions about the motorcycles in our garage.",
26)
This Agent will:
Automatically use search tools when it needs information
Combine search results with its reasoning capabilities
Provide accurate answers based on your specific data
Test the Agent:
1# Test the agent
2agent_conversation = rag_agent.start_conversation(messages="Who owns the Orion motorcycle?")
3status = agent_conversation.execute()
4print(f"\nAgent Answer: {status.message.content}")
5
6# Agent Answer: The Orion motorcycle is owned by Mike Chen. He owns a premium adventure touring motorcycle with advanced electronics, the Orion CX 1300 Helix, which has 136 horsepower.
The Agent autonomously decides when to search, what to search for, and how to use the results to answer questions.
RAG in Flows#
While Agents offer flexibility, Flows provide a structured approach to RAG with predictable retrieval workflows ideal for specific use cases.
Step 1. Create the RAG Flow#
Create a Flow that searches for relevant information before generating a response:
1from textwrap import dedent
2from wayflowcore.flow import Flow
3from wayflowcore.steps import CompleteStep, InputMessageStep, PromptExecutionStep, StartStep
4from wayflowcore.steps.searchstep import SearchStep
5# Define flow steps for RAG
6start_step = StartStep()
7
8user_input_step = InputMessageStep(
9 message_template=dedent(
10 """
11 Hello! I'm your motorcycle garage assistant powered by RAG.
12
13 I have access to information about all motorcycles in our garage.
14 What would you like to know?
15 """
16 )
17)
18
19search_step = SearchStep(
20 datastore=datastore, collection_name="motorcycles", k=3, search_config="motorcycle_search"
21)
22
23llm_response_step = PromptExecutionStep(
24 prompt_template=dedent(
25 """
26 You are a knowledgeable motorcycle garage assistant.
27 Answer the user's question based ONLY on the retrieved motorcycle information.
28
29 User's question: {{ user_query }}
30
31 Retrieved motorcycle information:
32 {% for doc in retrieved_documents %}
33 - Model: {{ doc.model_name }}
34 Owner: {{ doc.owner_name }}
35 Horsepower: {{ doc.hp }} HP
36 Description: {{ doc.description }}
37 {% endfor %}
38
39 Instructions:
40 - Base your answer strictly on the retrieved information
41 - If the information doesn't answer the question, say so clearly
42 - Be specific and mention relevant details from the motorcycles
43 """
44 ),
45 llm=llm,
46)
Key points:
The SearchStep uses semantic search to find relevant documents based on the user’s query.
The
kparameter limits the number of documents retrieved.Retrieved documents are passed to the LLM along with the original query for contextualized responses.
Step 2. Build and test the Flow#
Build the complete Flow with control and data connections:
1from wayflowcore.controlconnection import ControlFlowEdge
2from wayflowcore.dataconnection import DataFlowEdge
3
4# Build the RAG flow
5complete_step = CompleteStep()
6
7steps = {
8 "start": start_step,
9 "input": user_input_step,
10 "search": search_step,
11 "respond": llm_response_step,
12 "complete": complete_step,
13}
14
15control_flow_edges = [
16 ControlFlowEdge(source_step=start_step, destination_step=user_input_step),
17 ControlFlowEdge(source_step=user_input_step, destination_step=search_step),
18 ControlFlowEdge(source_step=search_step, destination_step=llm_response_step),
19 ControlFlowEdge(source_step=llm_response_step, destination_step=complete_step),
20]
21
22data_flow_edges = [
23 # Pass user query to search step
24 DataFlowEdge(
25 source_step=user_input_step,
26 source_output=InputMessageStep.USER_PROVIDED_INPUT,
27 destination_step=search_step,
28 destination_input=SearchStep.QUERY,
29 ),
30 # Pass user query to LLM for context
31 DataFlowEdge(
32 source_step=user_input_step,
33 source_output=InputMessageStep.USER_PROVIDED_INPUT,
34 destination_step=llm_response_step,
35 destination_input="user_query",
36 ),
37 # Pass retrieved documents to LLM
38 DataFlowEdge(
39 source_step=search_step,
40 source_output=SearchStep.DOCUMENTS,
41 destination_step=llm_response_step,
42 destination_input="retrieved_documents",
43 ),
44]
45
46rag_flow = Flow(
47 begin_step=start_step,
48 steps=steps,
49 control_flow_edges=control_flow_edges,
50 data_flow_edges=data_flow_edges,
51)
52
53# Test the flow
54conversation = rag_flow.start_conversation()
55conversation.execute()
56conversation.append_user_message("Which motorcycle has the most horsepower?")
57result = conversation.execute()
58print(f"\nRAG Flow Answer: {result.output_values[PromptExecutionStep.OUTPUT]}")
59# RAG Flow Answer: Based on the retrieved information, the motorcycle with the most horsepower is the Vortex Momentum X1, which has 214 HP.
60# This Italian superbike features MotoGP-derived technology and stunning performance, indicating its high power output.
The Flow provides a predictable pipeline: user input → search → response generation.
Advanced RAG Techniques#
Filtering search results#
You can filter search results based on metadata:
1# Filter search results by owner
2filtered_results = datastore.search(
3 collection_name="motorcycles", query="sport bike", k=5, where={"owner_name": "Sarah Johnson"}
4)
Multiple search configurations#
Create specialized search configurations for different use cases:
1from wayflowcore.datastore import OracleDatabaseDatastore
2from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
3
4# Configure Vector Config for Search
5vector_config = VectorConfig(model=embedding_model, collection_name="motorcycles", vector_property="embeddings")
6
7# Multiple search configurations for different use cases
8precise_search = SearchConfig(
9 name="precise_search",
10 retriever=VectorRetrieverConfig(
11 model=embedding_model,
12 collection_name="motorcycles",
13 distance_metric="cosine_distance",
14 ),
15)
16
17broad_search = SearchConfig(
18 name="broad_search",
19 retriever=VectorRetrieverConfig(
20 model=embedding_model,
21 collection_name="motorcycles",
22 distance_metric="l2_distance",
23 vectors = vector_config, # You can put your vector config directly in the Vector Retriever Config
24 ),
25)
26
27# Create OracleDB datastore with multiple search configs
28multi_search_datastore = OracleDatabaseDatastore(
29 connection_config=connection_config,
30 schema={"motorcycles": motorcycles},
31 search_configs=[precise_search, broad_search],
32 vector_configs=[vector_config],
33)
How multiple search configs work:
Each SearchConfig must have a unique name (auto-generated if not provided)
Search configs can target the same collection with different settings (distance metrics, vector configs)
Search Configs can also target multiple collections if no collection name is specified, provided there does not exist another search config which matches the collection name to search on.
When calling Datastore.search(), you specify which config to use via the
search_configparameterIf no
search_configis specified, the system looks for a default config for that collection, given that acollection_nameis specifiedThe first config that matches the collection (or has no specific collection) becomes the default
When to use each config
precise_search: Uses cosine similarity for semantic matching (best for meaning-based searches)broad_search: Uses Euclidean distance for broader matches (considers all dimensions equally)You explicitly choose which to use:
datastore.search(..., search_config="precise_search")
Customizing search behavior in Agents#
Create specialized search toolboxes with different parameters:
1# Create specialized search toolboxes
2detailed_search = datastore.get_search_toolbox(k=10)
3quick_search = datastore.get_search_toolbox(k=1)
4
5# Agent with multiple search strategies
6advanced_agent = Agent(
7 tools=[detailed_search, quick_search],
8 llm=llm,
9 custom_instruction=dedent(
10 """
11 You are an advanced motorcycle assistant with two search modes:
12 - Use detailed search for comprehensive questions requiring multiple examples
13 - Use quick search for simple factual questions about a specific motorcycle
14
15 Choose the appropriate search mode based on the user's question.
16 """
17 ),
18)
How specialized toolboxes work:
Each toolbox creates different search functions with fixed parameters
detailed_search: Always returns 10 results (k=10) for comprehensive analysisquick_search: Always returns 1 result (k=1) for focused answersThe Agent sees these as different tools:
search_motorcycles_detailedvssearch_motorcycles_quick
When each toolbox is used:
The Agent autonomously decides based on:
The user’s question complexity
Instructions in
custom_instructionContext of the conversation
For “tell me about all sport bikes” → likely uses
detailed_searchFor “who owns the Vortex?” → likely uses
quick_searchThe Agent’s reasoning determines the choice, guided by your instructions
Manual Serialization of Fields for Embeddings#
In this example, we show a manual serialization approach that performs cross-field logic that cannot be expressed with ConcatSerializerConfig. Instead of merely concatenating fields, we: - Compute derived attributes (e.g., performance class and hp bands from numeric horsepower) - Conditionally weight salient tokens (repeat model name for high-HP bikes) - Inject domain keywords based on the description semantics - Reorder fields and output a structured, sectioned Markdown document
This goes beyond per-field preprocessing and simple separators; it uses the full entity structure at once and conditional logic across multiple fields.
1# Advanced manual serialization that uses domain-specific, cross-field logic.
2# This goes beyond simple concatenation and cannot be reproduced with ConcatSerializerConfig,
3# which operates per-field and via string pre/post-processors without access to the full structured entity.
4from typing import Dict, Any, List
5
6def serialize_motorcycle_advanced(entity: Dict[str, Any]) -> str:
7 """
8 Produce a Markdown-formatted string with:
9 - Conditional weighting: repeat model name tokens based on horsepower bands
10 - Derived fields: performance class and hp_band computed from numeric hp
11 - Conditional keyword injection from description semantics
12 - Field re-ordering and sectioned formatting for domain salience
13 """
14 model = str(entity.get("model_name", "")).strip()
15 desc = str(entity.get("description", "")).strip()
16 owner = str(entity.get("owner_name", "")).strip()
17 try:
18 hp = int(entity.get("hp") or 0)
19 except Exception:
20 hp = 0
21
22 # Derived performance class and weighting based on hp
23 if hp >= 170:
24 performance = "track-ready superbike"
25 weight_repeats = 3
26 elif hp >= 120:
27 performance = "high-performance sport bike"
28 weight_repeats = 2
29 elif hp >= 70:
30 performance = "standard road motorcycle"
31 weight_repeats = 1
32 else:
33 performance = "lightweight commuter / trail bike"
34 weight_repeats = 1
35
36 # Keyword injection (conditional, cross-field)
37 lower_desc = desc.lower()
38 keywords: List[str] = []
39 if "race" in lower_desc or "sport" in lower_desc or hp >= 150:
40 keywords += ["sport bike", "supersport", "track-focused"]
41 if "touring" in lower_desc or "comfortable" in lower_desc or "adventure" in lower_desc:
42 keywords += ["touring", "long-distance", "comfort"]
43 if "dirt" in lower_desc or "off-road" in lower_desc or "trail" in lower_desc:
44 keywords += ["off-road", "dual-sport", "trail"]
45
46 # Deduplicate while preserving order
47 seen = set()
48 deduped_keywords: List[str] = []
49 for kw in keywords:
50 if kw not in seen:
51 deduped_keywords.append(kw)
52 seen.add(kw)
53
54 # Compose Markdown with intentional ordering and sections
55 title = f"# {model}"
56 # Token weighting via repetition (helps some embedding models emphasize salient tokens)
57 if weight_repeats > 1 and model:
58 title = title + (" " + model) * (weight_repeats - 1)
59
60 body_lines: List[str] = [
61 f"## Performance: {performance}",
62 f"hp_band: {max(0, (hp // 10) * 10)}+ HP",
63 f"owner: {owner}" if owner else "",
64 "## Description",
65 desc,
66 ]
67 if deduped_keywords:
68 body_lines += ["## Keywords", ", ".join(deduped_keywords)]
69
70 # Join non-empty lines
71 body = "\n".join([line for line in body_lines if line and line.strip()])
72
73 return f"{title}\n{body}"
74
75# Example usage (when you want to manually control embeddings):
76# for entity in motorcycle_data:
77# entity["serialized_text"] = serialize_motorcycle_advanced(entity)
78# entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
Example usage when generating embeddings:
for entity in motorcycle_data:
entity["serialized_text"] = serialize_motorcycle_advanced(entity)
entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
Why use explicit serialization?
Cross-field logic: derive fields (e.g., performance class from hp) and conditionally add keywords.
Conditional weighting: repeat or emphasize tokens under certain conditions (e.g., horsepower thresholds).
Structured formatting: generate Markdown sections and control field ordering for domain salience.
Auditable and deterministic: the exact text used for embeddings is transparent and reproducible.
Limitations of ConcatSerializerConfig and when to choose manual serialization:
ConcatSerializerConfig is powerful for per-field concatenation with simple pre/post processing and exclusion of columns.
It does not perform arbitrarily complex cross-field computations, conditional token weighting, or multi-field derived features.
Choose manual serialization whenever you need entity-level reasoning to craft the embedding text, beyond simple concatenation and formatting.
Note
Selective field embedding—using serializers to specify which fields participate in embedding generation—is best supported and straightforward in the InMemoryDatastore backend (see its API for serializer support). For OracleDatabaseDatastore, you are responsible for constructing and storing the embeddings explicitly, and there is no out-of-the-box field-level selection. For configuring the serialized text and embeddings column externally, you can make use of ConcatSerializerConfig outside the Datastore while generating the serialized text for the embeddings. OracleDatabaseDatastore assumes that the embedding column has already been generated and does not implicitly create embeddings.
Note
For rapid prototyping, use InMemoryDatastore with custom serializers for full flexibility, then migrate to OracleDatabaseDatastore for production workloads that require persistence and scalability.
Agent Spec Exporting/Loading#
You can export the agent configuration to its Agent Spec configuration using the AgentSpecExporter.
# Export the RAG agent to Agent Spec JSON
from wayflowcore.agentspec import AgentSpecExporter
rag_agent_ir_json = AgentSpecExporter().to_json(rag_agent)
Here is what the Agent Spec representation will look like ↓
Click here to see the assistant configuration.
{
"component_type": "ExtendedAgent",
"id": "cefee4ec-cb9d-4bc5-8361-a34860ced665",
"name": "agent_52e70c67__auto",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [],
"outputs": [],
"llm_config": {
"component_type": "VllmConfig",
"id": "1d26dfa9-f35f-4e21-8c30-248213ac0601",
"name": "llm_70781625__auto",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"default_generation_parameters": {
"max_tokens": 512
},
"url": "host_urls.com",
"model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct"
},
"system_prompt": "\nYou are a helpful motorcycle garage assistant with access to our motorcycle database.\n\nIMPORTANT:\n- Always search for relevant information before answering questions about motorcycles\n- Base your answers on the search results\n- If you can't find relevant information, say so clearly\n- Be specific and mention details from the search results\n\nYou have access to search tools that can find information about:\n- Motorcycle models and specifications\n- Owners of motorcycles\n- Horsepower and performance details\n- Descriptions and features\n",
"tools": [
{
"component_type": "PluginToolFromToolBox",
"id": "5c4bf7fb-79ba-4e3e-a671-e2e2945b7600",
"name": "search_motorcycles",
"description": "Search for Motorcycles in our garage in the database using semantic similarity.\n\nThis tool searches the motorcycles collection for entities that match the given query.\nIt returns exactly 3 matching records with their properties and similarity scores.\nUse this tool when you need to find information about Motorcycles in our garage.\n\nParameters\n----------\nquery : str\n The search query string to find relevant Motorcycles in our garage.\n",
"metadata": {
"__metadata_info__": {}
},
"inputs": [],
"outputs": [],
"tool_name": "search_motorcycles",
"toolbox": {
"component_type": "PluginSearchToolBox",
"id": "54f6a02a-9dba-480a-a9f0-4d86fff937a5",
"name": "search_toolbox5f99358b__auto",
"description": null,
"metadata": {},
"collection_names": null,
"k": 3,
"datastore": {
"component_type": "PluginOracleDatabaseDatastore",
"id": "de87d17c-9654-47ce-a43f-0c827e52b5f6",
"name": "oracle_datastoreed7b27dc__auto",
"description": null,
"metadata": {},
"datastore_schema": {
"motorcycles": {
"description": "Motorcycles in our garage",
"title": "",
"properties": {
"description": {
"type": "string",
"description": "Detailed description of the motorcycle"
},
"owner_name": {
"description": "Name of the motorcycle owner",
"type": "string"
},
"model_name": {
"description": "Motorcycle model and brand",
"type": "string"
},
"hp": {
"description": "Horsepower of the motorcycle",
"type": "integer"
},
"serialized_text": {
"description": "Concatenated string of all columns",
"type": "string"
},
"embeddings": {
"description": "Generated embeddings for serialized_text",
"type": "array",
"items": {
"type": "number"
},
"x_vector_property": true
}
}
}
},
"connection_config": {
"component_type": "PluginTlsOracleDatabaseConnectionConfig",
"id": "8dbd3707-cd10-44f8-bbc1-15b69ac83c14",
"name": "PluginTlsOracleDatabaseConnectionConfig",
"description": null,
"metadata": {},
"user": "user",
"password": "password",
"dsn": "dsn",
"config_dir": null,
"component_plugin_name": "DatastorePlugin",
"component_plugin_version": "25.4.1"
},
"search_configs": [
{
"component_type": "PluginSearchConfig",
"id": "c983247b-bc7a-43c3-af16-b952fa9714e5",
"name": "motorcycle_search",
"description": null,
"metadata": {},
"retriever": {
"component_type": "PluginVectorRetrieverConfig",
"id": "e26ea01e-e501-4f09-b5f4-8a96cd3daa77",
"name": "motorcycles",
"description": null,
"metadata": {},
"model": {
"component_type": "PluginVllmEmbeddingConfig",
"id": "fe1e8f74-cf16-4dea-8ba5-08f4629aea0a",
"name": "embedding_modeledf13d6a__auto",
"description": null,
"metadata": {},
"url": "model_url.com",
"model_id": "intfloat/e5-large-v2",
"component_plugin_name": "EmbeddingModelPlugin",
"component_plugin_version": "25.4.1"
},
"collection_name": "motorcycles",
"vectors": null,
"distance_metric": "cosine_distance",
"index_params": {},
"component_plugin_name": "VectorRetrieverConfigPlugin",
"component_plugin_version": "25.4.1"
},
"component_plugin_name": "SearchConfigPlugin",
"component_plugin_version": "25.4.1"
}
],
"vector_configs": [],
"component_plugin_name": "DatastorePlugin",
"component_plugin_version": "25.4.1"
},
"component_plugin_name": "SearchToolBoxPlugin",
"component_plugin_version": "25.4.1"
},
"component_plugin_name": "ToolFromToolBoxPlugin",
"component_plugin_version": "25.4.1"
}
],
"toolboxes": [],
"context_providers": null,
"can_finish_conversation": false,
"max_iterations": 10,
"initial_message": "Hello! I'm your RAG-powered motorcycle assistant. I can search our database to answer your questions about the motorcycles in our garage.",
"caller_input_mode": "always",
"agents": [],
"flows": [],
"agent_template": {
"component_type": "PluginPromptTemplate",
"id": "c8d2fd47-ab10-468f-9c55-c8fa2c459c1a",
"name": "",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"messages": [
{
"role": "system",
"contents": [
{
"type": "text",
"content": "{%- if __TOOLS__ -%}\nEnvironment: ipython\nCutting Knowledge Date: December 2023\n\nYou are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.\n\nYou have access to the following functions. To call a function, please respond with JSON for a function call.\nRespond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.\nDo not use variables.\n\n[{% for tool in __TOOLS__%}{{tool.to_openai_format() | tojson}}{{', ' if not loop.last}}{% endfor %}]\n{%- endif -%}\n"
}
],
"tool_requests": null,
"tool_result": null,
"display_only": false,
"sender": null,
"recipients": [],
"time_created": "2025-10-29T10:19:45.987272+00:00",
"time_updated": "2025-10-29T10:19:45.987272+00:00"
},
{
"role": "system",
"contents": [
{
"type": "text",
"content": "{%- if custom_instruction -%}Additional instructions:\n{{custom_instruction}}{%- endif -%}"
}
],
"tool_requests": null,
"tool_result": null,
"display_only": false,
"sender": null,
"recipients": [],
"time_created": "2025-10-29T10:19:45.987302+00:00",
"time_updated": "2025-10-29T10:19:45.987302+00:00"
},
{
"role": "system",
"contents": [
{
"type": "text",
"content": "$$__CHAT_HISTORY_PLACEHOLDER__$$"
}
],
"tool_requests": null,
"tool_result": null,
"display_only": false,
"sender": null,
"recipients": [],
"time_created": "2025-10-29T10:19:45.983942+00:00",
"time_updated": "2025-10-29T10:19:45.983943+00:00"
},
{
"role": "system",
"contents": [
{
"type": "text",
"content": "{% if __PLAN__ %}The current plan you should follow is the following: \n{{__PLAN__}}{% endif %}"
}
],
"tool_requests": null,
"tool_result": null,
"display_only": false,
"sender": null,
"recipients": [],
"time_created": "2025-10-29T10:19:45.987326+00:00",
"time_updated": "2025-10-29T10:19:45.987326+00:00"
}
],
"output_parser": {
"component_type": "PluginJsonToolOutputParser",
"id": "e5ed717a-bb76-4c13-b443-640444b98d3b",
"name": "jsontool_outputparser",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"tools": null,
"component_plugin_name": "OutputParserPlugin",
"component_plugin_version": "25.4.1"
},
"inputs": [
{
"description": "\"__TOOLS__\" input variable for the template",
"title": "__TOOLS__"
},
{
"description": "\"custom_instruction\" input variable for the template",
"type": "string",
"title": "custom_instruction"
},
{
"description": "\"__PLAN__\" input variable for the template",
"type": "string",
"title": "__PLAN__",
"default": ""
},
{
"type": "array",
"items": {},
"title": "__CHAT_HISTORY__"
}
],
"pre_rendering_transforms": null,
"post_rendering_transforms": [
{
"component_type": "PluginRemoveEmptyNonUserMessageTransform",
"id": "73631f33-9ade-420f-8cc1-775a24dd47d3",
"name": "removeemptynonusermessage_messagetransform",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"component_plugin_name": "MessageTransformPlugin",
"component_plugin_version": "25.4.1"
},
{
"component_type": "PluginCoalesceSystemMessagesTransform",
"id": "9c65df01-2987-46e0-b2d1-082b79ee9a34",
"name": "coalescesystemmessage_messagetransform",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"component_plugin_name": "MessageTransformPlugin",
"component_plugin_version": "25.4.1"
},
{
"component_type": "PluginLlamaMergeToolRequestAndCallsTransform",
"id": "9f3e25ea-73e9-4cee-bcbc-60b95720c023",
"name": "llamamergetoolrequestandcalls_messagetransform",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"component_plugin_name": "MessageTransformPlugin",
"component_plugin_version": "25.4.1"
}
],
"tools": null,
"native_tool_calling": false,
"response_format": null,
"native_structured_generation": true,
"generation_config": null,
"component_plugin_name": "PromptTemplatePlugin",
"component_plugin_version": "25.4.1"
},
"component_plugin_name": "AgentPlugin",
"component_plugin_version": "25.4.1",
"agentspec_version": "25.4.1"
}
component_type: ExtendedAgent
id: 6be99be6-1540-4a0f-897e-2036b7c459b0
name: agent_f607ea30__auto
description: ''
metadata:
__metadata_info__: {}
inputs: []
outputs: []
llm_config:
component_type: VllmConfig
id: 5461e0f4-7270-449b-983a-1fdb41e15845
name: llm_ce3b3e36__auto
description: null
metadata:
__metadata_info__: {}
default_generation_parameters:
max_tokens: 512
url: host_url.com
model_id: meta-llama/Meta-Llama-3.1-8B-Instruct
system_prompt: '
You are a helpful motorcycle garage assistant with access to our motorcycle database.
IMPORTANT:
- Always search for relevant information before answering questions about motorcycles
- Base your answers on the search results
- If you can''t find relevant information, say so clearly
- Be specific and mention details from the search results
You have access to search tools that can find information about:
- Motorcycle models and specifications
- Owners of motorcycles
- Horsepower and performance details
- Descriptions and features
'
tools:
- component_type: PluginToolFromToolBox
id: e6901838-1fd1-44fe-bfa3-214727719124
name: search_motorcycles
description: "Search for Motorcycles in our garage in the database using semantic\
\ similarity.\n\nThis tool searches the motorcycles collection for entities that\
\ match the given query.\nIt returns exactly 3 matching records with their properties\
\ and similarity scores.\nUse this tool when you need to find information about\
\ Motorcycles in our garage.\n\nParameters\n----------\nquery : str\n The search\
\ query string to find relevant Motorcycles in our garage.\n"
metadata:
__metadata_info__: {}
inputs: []
outputs: []
tool_name: search_motorcycles
toolbox:
component_type: PluginSearchToolBox
id: d09ff0dc-3ed5-40ae-8dd2-9d66fa08c460
name: search_toolbox6abbf352__auto
description: null
metadata: {}
collection_names: null
k: 3
datastore:
component_type: PluginOracleDatabaseDatastore
id: cea31a0d-ea61-4730-b982-18ee3572d036
name: oracle_datastorebac11430__auto
description: null
metadata: {}
datastore_schema:
motorcycles:
description: Motorcycles in our garage
title: ''
properties:
description:
type: string
description: Detailed description of the motorcycle
owner_name:
description: Name of the motorcycle owner
type: string
model_name:
description: Motorcycle model and brand
type: string
hp:
description: Horsepower of the motorcycle
type: integer
serialized_text:
description: Concatenated string of all columns
type: string
embeddings:
description: Generated embeddings for serialized_text
type: array
items:
type: number
x_vector_property: true
connection_config:
component_type: PluginTlsOracleDatabaseConnectionConfig
id: a4c8fc8a-6200-4b1b-88f2-489419b5a8cb
name: PluginTlsOracleDatabaseConnectionConfig
description: null
metadata: {}
user: user
password: password
dsn: dsn
config_dir: null
component_plugin_name: DatastorePlugin
component_plugin_version: 25.4.1
search_configs:
- component_type: PluginSearchConfig
id: a5f4ce97-150f-4b76-bae9-5e157d11d64a
name: motorcycle_search
description: null
metadata: {}
retriever:
component_type: PluginVectorRetrieverConfig
id: 893ddb4a-0350-45ad-a01b-2222a4bbb71f
name: motorcycles
description: null
metadata: {}
model:
component_type: PluginVllmEmbeddingConfig
id: df506ab3-5d76-47b5-80b8-19d9b293a067
name: embedding_modeld79566c9__auto
description: null
metadata: {}
url: model_url.com
model_id: intfloat/e5-large-v2
component_plugin_name: EmbeddingModelPlugin
component_plugin_version: 25.4.1
collection_name: motorcycles
vectors: null
distance_metric: cosine_distance
index_params: {}
component_plugin_name: VectorRetrieverConfigPlugin
component_plugin_version: 25.4.1
component_plugin_name: SearchConfigPlugin
component_plugin_version: 25.4.1
vector_configs: []
component_plugin_name: DatastorePlugin
component_plugin_version: 25.4.1
component_plugin_name: SearchToolBoxPlugin
component_plugin_version: 25.4.1
component_plugin_name: ToolFromToolBoxPlugin
component_plugin_version: 25.4.1
toolboxes: []
context_providers: null
can_finish_conversation: false
max_iterations: 10
initial_message: Hello! I'm your RAG-powered motorcycle assistant. I can search our
database to answer your questions about the motorcycles in our garage.
caller_input_mode: always
agents: []
flows: []
agent_template:
component_type: PluginPromptTemplate
id: 31ba4592-5627-4d8e-ba21-cd39e2a4cf56
name: ''
description: null
metadata:
__metadata_info__: {}
messages:
- role: system
contents:
- type: text
content: '{%- if __TOOLS__ -%}
Environment: ipython
Cutting Knowledge Date: December 2023
You are a helpful assistant with tool calling capabilities. Only reply with
a tool call if the function exists in the library provided by the user. If
it doesn''t exist, just reply directly in natural language. When you receive
a tool call response, use the output to format an answer to the original user
question.
You have access to the following functions. To call a function, please respond
with JSON for a function call.
Respond in the format {"name": function name, "parameters": dictionary of
argument name and its value}.
Do not use variables.
[{% for tool in __TOOLS__%}{{tool.to_openai_format() | tojson}}{{'', '' if
not loop.last}}{% endfor %}]
{%- endif -%}
'
tool_requests: null
tool_result: null
display_only: false
sender: null
recipients: []
time_created: '2025-10-29T10:21:46.793553+00:00'
time_updated: '2025-10-29T10:21:46.793554+00:00'
- role: system
contents:
- type: text
content: '{%- if custom_instruction -%}Additional instructions:
{{custom_instruction}}{%- endif -%}'
tool_requests: null
tool_result: null
display_only: false
sender: null
recipients: []
time_created: '2025-10-29T10:21:46.793585+00:00'
time_updated: '2025-10-29T10:21:46.793585+00:00'
- role: system
contents:
- type: text
content: $$__CHAT_HISTORY_PLACEHOLDER__$$
tool_requests: null
tool_result: null
display_only: false
sender: null
recipients: []
time_created: '2025-10-29T10:21:46.790207+00:00'
time_updated: '2025-10-29T10:21:46.790208+00:00'
- role: system
contents:
- type: text
content: "{% if __PLAN__ %}The current plan you should follow is the following:\
\ \n{{__PLAN__}}{% endif %}"
tool_requests: null
tool_result: null
display_only: false
sender: null
recipients: []
time_created: '2025-10-29T10:21:46.793609+00:00'
time_updated: '2025-10-29T10:21:46.793609+00:00'
output_parser:
component_type: PluginJsonToolOutputParser
id: b7249231-a601-42b9-8a6d-61ec5d9d4799
name: jsontool_outputparser
description: null
metadata:
__metadata_info__: {}
tools: null
component_plugin_name: OutputParserPlugin
component_plugin_version: 25.4.1
inputs:
- description: '"__TOOLS__" input variable for the template'
title: __TOOLS__
- description: '"custom_instruction" input variable for the template'
type: string
title: custom_instruction
- description: '"__PLAN__" input variable for the template'
type: string
title: __PLAN__
default: ''
- type: array
items: {}
title: __CHAT_HISTORY__
pre_rendering_transforms: null
post_rendering_transforms:
- component_type: PluginRemoveEmptyNonUserMessageTransform
id: 929d8caf-ef98-4328-b961-658f9b027603
name: removeemptynonusermessage_messagetransform
description: null
metadata:
__metadata_info__: {}
component_plugin_name: MessageTransformPlugin
component_plugin_version: 25.4.1
- component_type: PluginCoalesceSystemMessagesTransform
id: 68ff19d3-c152-47a1-98fa-0597fbe6fd8c
name: coalescesystemmessage_messagetransform
description: null
metadata:
__metadata_info__: {}
component_plugin_name: MessageTransformPlugin
component_plugin_version: 25.4.1
- component_type: PluginLlamaMergeToolRequestAndCallsTransform
id: 68024b92-c2fd-4ec4-ada8-42589c54b480
name: llamamergetoolrequestandcalls_messagetransform
description: null
metadata:
__metadata_info__: {}
component_plugin_name: MessageTransformPlugin
component_plugin_version: 25.4.1
tools: null
native_tool_calling: false
response_format: null
native_structured_generation: true
generation_config: null
component_plugin_name: PromptTemplatePlugin
component_plugin_version: 25.4.1
component_plugin_name: AgentPlugin
component_plugin_version: 25.4.1
agentspec_version: 25.4.1
Warning
The Oracle Database Connection Config objects contain several sensitive values
(like username, password, wallet location) that will not be serialized by the AgentSpecExporter.
These will be serialized as references that must be resolved at loading time, by specifying the values
of these sensitive fields in the component_registry argument of the loader:
component_registry = {
# We map the ID of the sensitive fields in the connection config to their values
"oracle_datastore_connection_config.user": "<db user>", # Replace with your DB user
"oracle_datastore_connection_config.password": "<db password>", # Replace with your DB password # nosec: this is just a placeholder
"oracle_datastore_connection_config.dsn": "<db connection string>", # e.g. "(description=(retry_count=2)..."
}
You can then load the configuration back to an assistant using the AgentSpecLoader.
# Load an agent from Agent Spec JSON
from wayflowcore.agentspec import AgentSpecLoader
tool_registry = {tool.name: tool for tool in search_toolbox.get_tools()}
new_rag_agent = AgentSpecLoader(tool_registry=tool_registry).load_json(rag_agent_ir_json, components_registry=component_registry)
Note
This guide uses the following extension/plugin Agent Spec components:
See the list of available Agent Spec extension/plugin components in the API Reference
Cleaning Up Datastore#
Before moving on, you may want to cleanup the table created in Oracle Database for this tutorial. For cleaning up, you can use the following code below.
This code will drop the motorcycles from your Oracle Database using the environment_config function defined in the Setting Up section.
ORACLE_DB_CLEANUP = "DROP TABLE IF EXISTS motorcycles cascade constraints"
def cleanup_oracle_datastore():
connection_config = environment_config()
conn = connection_config.get_connection()
conn.cursor().execute(ORACLE_DB_CLEANUP)
conn.close()
cleanup_oracle_datastore()
Recap#
In this guide, you learned how to build RAG-powered assistants using WayFlow:
The key difference between Agents and Flows for RAG:
Agents offer dynamic, autonomous retrieval based on the conversation context - ideal when you want the AI to decide when and what to search
Flows provide predictable, structured retrieval workflows - ideal when you want consistent behavior for specific use cases
Key techniques covered:
Basic RAG: Using all fields for embeddings and search
Filtered search: Limiting results based on metadata
Multiple search configs: Different strategies for different use cases with explicit selection
Multiple toolboxes: Allowing Agents to choose between different search strategies autonomously
Important
Before deploying your RAG application to production, you MUST:
Configure Oracle AI Vector Search for scalable vector operations
Test performance with production-scale data
Implement proper error handling and monitoring
For development and testing, you can use the InMemoryDataStore, the same APIs work with both datastores:
# Development (NOT for production)
datastore = InMemoryDatastore(schema={"motorcycles": motorcycles})
# Production (use this instead)
datastore = OracleDatabaseDatastore(
connection_string="your_oracle_connection",
schema={"motorcycles": motorcycles}
# connection db params
)
See the OracleDatabaseDatastore guide for complete migration instructions.
Next steps#
Deployment Considerations: Now your application is backed by OracleDatabaseDatastore from the start. Your setup is production-ready, persistent, and scalable using Oracle AI Vector Search.
Always test with your own database connection and schema for production.
Ensure your Oracle user has all necessary table privileges.
For advanced vector functionality, see the OracleDatabaseDatastore API guide.
Full code#
Click on the card at the top of this page to download the full code for this guide or copy the code below.
1# Copyright © 2025 Oracle and/or its affiliates.
2#
3# This software is under the Apache License 2.0
4# %%[markdown]
5# WayFlow Code Example - How to build RAG-Powered Assistants
6# ----------------------------------------------------------
7
8# How to use:
9# Create a new Python virtual environment and install the latest WayFlow version.
10# ```bash
11# python -m venv venv-wayflowcore
12# source venv-wayflowcore/bin/activate
13# pip install --upgrade pip
14# pip install "wayflowcore==26.2.0.dev0"
15# ```
16
17# You can now run the script
18# 1. As a Python file:
19# ```bash
20# python howto_rag.py
21# ```
22# 2. As a Notebook (in VSCode):
23# When viewing the file,
24# - press the keys Ctrl + Enter to run the selected cell
25# - or Shift + Enter to run the selected cell and move to the cell below# (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
26# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.
27
28
29
30
31
32# %%[markdown]
33## Embedding-config
34
35# %%
36from wayflowcore.embeddingmodels import VllmEmbeddingModel
37# Configure embedding model for vector search
38embedding_model = VllmEmbeddingModel(base_url="EMBEDDING_API_URL", model_id="model-id")
39
40
41
42# %%[markdown]
43## Llm-config
44
45# %%
46# Configure LLM
47from wayflowcore.models import VllmModel
48
49llm = VllmModel(
50 model_id="model-id",
51 host_port="VLLM_HOST_PORT",
52)
53
54
55
56# %%[markdown]
57## Entity-define
58
59# %%
60# Define the motorcycle entity schema
61from wayflowcore.datastore import Entity
62from wayflowcore.property import IntegerProperty, StringProperty, VectorProperty
63
64motorcycles = Entity(
65 description="Motorcycles in our garage",
66 properties={
67 "owner_name": StringProperty(description="Name of the motorcycle owner"),
68 "model_name": StringProperty(description="Motorcycle model and brand"),
69 "description": StringProperty(description="Detailed description of the motorcycle"),
70 "hp": IntegerProperty(description="Horsepower of the motorcycle"),
71 "serialized_text": StringProperty(description="Concatenated string of all columns"),
72 "embeddings": VectorProperty(description="Generated embeddings for serialized_text"),
73 },
74)
75
76
77
78# %%[markdown]
79## Search-config
80
81# %%
82from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
83
84# Configure Vector Config for Search
85vector_config = VectorConfig(
86 model=embedding_model,
87 collection_name="motorcycles",
88 vector_property="embeddings"
89)
90
91# Configure vector search for semantic similarity matching
92search_config = SearchConfig(
93 name="motorcycle_search",
94 retriever=VectorRetrieverConfig(
95 model=embedding_model,
96 collection_name="motorcycles",
97 distance_metric="cosine_distance",
98 ),
99)
100
101
102
103# %%[markdown]
104## Oracle-connection
105
106# %%
107import os
108import oracledb
109
110from wayflowcore.datastore.oracle import MTlsOracleDatabaseConnectionConfig, TlsOracleDatabaseConnectionConfig
111
112def environment_config():
113 mtls_vars = (
114 "ADB_CONFIG_DIR",
115 "ADB_WALLET_DIR",
116 "ADB_WALLET_SECRET",
117 "ADB_DB_USER",
118 "ADB_DB_PASSWORD",
119 "ADB_DSN",
120 )
121 tls_vars = ("ADB_DB_USER", "ADB_DB_PASSWORD", "ADB_DSN")
122 if all(v in os.environ for v in mtls_vars):
123 return MTlsOracleDatabaseConnectionConfig(
124 config_dir=os.environ["ADB_CONFIG_DIR"],
125 wallet_location=os.environ["ADB_WALLET_DIR"],
126 wallet_password=os.environ["ADB_WALLET_SECRET"],
127 user=os.environ["ADB_DB_USER"],
128 password=os.environ["ADB_DB_PASSWORD"],
129 dsn=os.environ["ADB_DSN"],
130 id="oracle_datastore_connection_config",
131 )
132 if all(v in os.environ for v in tls_vars):
133 return TlsOracleDatabaseConnectionConfig(
134 user=os.environ["ADB_DB_USER"],
135 password=os.environ["ADB_DB_PASSWORD"],
136 dsn=os.environ["ADB_DSN"],
137 id="oracle_datastore_connection_config",
138 )
139 raise Exception("Required OracleDB environment variables not found")
140
141
142connection_config = environment_config()
143
144ORACLE_DB_DDL = """
145 CREATE TABLE motorcycles (
146 owner_name VARCHAR2(255),
147 model_name VARCHAR2(255),
148 description VARCHAR2(255),
149 hp INTEGER,
150 serialized_text VARCHAR2(1023),
151 embeddings VECTOR
152)"""
153
154with connection_config.get_connection() as conn:
155 with conn.cursor() as cursor:
156 try:
157 cursor.execute(ORACLE_DB_DDL)
158 except oracledb.DatabaseError as e:
159 print(f"DDL execution warning: {e}")
160
161
162# %%[markdown]
163## Datastore-create-rag
164
165# %%
166from wayflowcore.datastore import OracleDatabaseDatastore
167from wayflowcore.search.config import ConcatSerializerConfig
168
169# Create Oracle Database datastore with vector search capability
170datastore = OracleDatabaseDatastore(
171 connection_config=connection_config,
172 schema={"motorcycles": motorcycles},
173 search_configs=[search_config],
174 vector_configs=[vector_config],
175)
176
177# Sample motorcycle data
178motorcycle_data = [
179 {
180 "owner_name": "John Smith",
181 "model_name": "Galaxion Thunderchief",
182 "hp": 87,
183 "description": "Classic American touring motorcycle with chrome details and comfortable seating.",
184 },
185 {
186 "owner_name": "Sarah Johnson",
187 "model_name": "Starlite Apex-R7",
188 "hp": 118,
189 "description": "High-performance supersport motorcycle designed for track racing.",
190 },
191 {
192 "owner_name": "Mike Chen",
193 "model_name": "Orion CX 1300 Helix",
194 "hp": 136,
195 "description": "Premium adventure touring motorcycle with advanced electronics.",
196 },
197 {
198 "owner_name": "Emily Davis",
199 "model_name": "Nebula Trailrunner 500",
200 "hp": 45,
201 "description": "Street-legal dirt bike perfect for off-road adventures.",
202 },
203 {
204 "owner_name": "Carlos Rodriguez",
205 "model_name": "Vortex Momentum X1",
206 "hp": 214,
207 "description": "Italian superbike with MotoGP-derived technology and stunning performance.",
208 },
209]
210# Configure Serializer to serialize columns into a string
211serializer = ConcatSerializerConfig()
212# Generate serialized_text and embeddings
213for entity in motorcycle_data:
214 entity["serialized_text"] = serializer.serialize(entity)
215 entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
216
217# Populate the OracleDB datastore
218datastore.create(collection_name="motorcycles", entities=motorcycle_data)
219
220
221# %%[markdown]
222## Create-vector-index
223
224# %%
225import oracledb
226
227# Configure Vector Index
228VECTOR_INDEX_DDL = """
229 CREATE VECTOR INDEX hnsw_image
230 ON motorcycles (embeddings)
231 ORGANIZATION INMEMORY NEIGHBOR GRAPH
232 DISTANCE COSINE
233 WITH TARGET ACCURACY 95;
234"""
235with connection_config.get_connection() as connection:
236 with connection.cursor() as cursor:
237 try:
238 cursor.execute(VECTOR_INDEX_DDL)
239 connection.commit()
240 except oracledb.DatabaseError as e:
241 print(f"Vector Index Creation warning: {e}")
242
243
244# %%[markdown]
245## Direct-search-example
246
247# %%
248# Example of direct vector search
249results = datastore.search(
250 collection_name="motorcycles", query="high performance sport bike for racing", k=3
251)
252
253print("Direct search results:")
254for result in results:
255 print(f"- {result['model_name']}")
256
257# Direct search results:
258# - Starlite Apex-R7
259# - Vortex Momentum X1
260# - Nebula Trailrunner 500
261
262# RAG AGENT IMPLEMENTATION
263
264
265# %%[markdown]
266## Agent Tools Rag
267
268# %%
269# Create search tools for the agent
270search_toolbox = datastore.get_search_toolbox(k=3)
271
272
273
274# %%[markdown]
275## Agent Create Rag
276
277# %%
278from textwrap import dedent
279from wayflowcore.agent import Agent
280
281# Create RAG-powered agent
282rag_agent = Agent(
283 tools=search_toolbox.get_tools(),
284 llm=llm,
285 custom_instruction=dedent(
286 """
287 You are a helpful motorcycle garage assistant with access to our motorcycle database.
288
289 IMPORTANT:
290 - Always search for relevant information before answering questions about motorcycles
291 - Base your answers on the search results
292 - If you can't find relevant information, say so clearly
293 - Be specific and mention details from the search results
294
295 You have access to search tools that can find information about:
296 - Motorcycle models and specifications
297 - Owners of motorcycles
298 - Horsepower and performance details
299 - Descriptions and features
300 """
301 ),
302 initial_message="Hello! I'm your RAG-powered motorcycle assistant. I can search our database to answer your questions about the motorcycles in our garage.",
303)
304
305
306
307# %%[markdown]
308## Agent Test Rag
309
310# %%
311# Test the agent
312agent_conversation = rag_agent.start_conversation(messages="Who owns the Orion motorcycle?")
313status = agent_conversation.execute()
314print(f"\nAgent Answer: {status.message.content}")
315
316# Agent Answer: The Orion motorcycle is owned by Mike Chen. He owns a premium adventure touring motorcycle with advanced electronics, the Orion CX 1300 Helix, which has 136 horsepower.
317
318
319# %%[markdown]
320## Export Config to Agent Spec
321
322# %%
323# Export the RAG agent to Agent Spec JSON
324from wayflowcore.agentspec import AgentSpecExporter
325
326rag_agent_ir_json = AgentSpecExporter().to_json(rag_agent)
327
328# %%[markdown]
329## Provide sensitive information when loading the Agent Spec config
330
331# %%
332component_registry = {
333 # We map the ID of the sensitive fields in the connection config to their values
334 "oracle_datastore_connection_config.user": "<db user>", # Replace with your DB user
335 "oracle_datastore_connection_config.password": "<db password>", # Replace with your DB password # nosec: this is just a placeholder
336 "oracle_datastore_connection_config.dsn": "<db connection string>", # e.g. "(description=(retry_count=2)..."
337}
338
339# %%[markdown]
340## Load Agent Spec Config
341
342# %%
343# Load an agent from Agent Spec JSON
344from wayflowcore.agentspec import AgentSpecLoader
345
346tool_registry = {tool.name: tool for tool in search_toolbox.get_tools()}
347new_rag_agent = AgentSpecLoader(tool_registry=tool_registry).load_json(rag_agent_ir_json, components_registry=component_registry)
348
349# RAG FLOW IMPLEMENTATION
350
351
352# %%[markdown]
353## Flow Steps Rag
354
355# %%
356from textwrap import dedent
357from wayflowcore.flow import Flow
358from wayflowcore.steps import CompleteStep, InputMessageStep, PromptExecutionStep, StartStep
359from wayflowcore.steps.searchstep import SearchStep
360# Define flow steps for RAG
361start_step = StartStep()
362
363user_input_step = InputMessageStep(
364 message_template=dedent(
365 """
366 Hello! I'm your motorcycle garage assistant powered by RAG.
367
368 I have access to information about all motorcycles in our garage.
369 What would you like to know?
370 """
371 )
372)
373
374search_step = SearchStep(
375 datastore=datastore, collection_name="motorcycles", k=3, search_config="motorcycle_search"
376)
377
378llm_response_step = PromptExecutionStep(
379 prompt_template=dedent(
380 """
381 You are a knowledgeable motorcycle garage assistant.
382 Answer the user's question based ONLY on the retrieved motorcycle information.
383
384 User's question: {{ user_query }}
385
386 Retrieved motorcycle information:
387 {% for doc in retrieved_documents %}
388 - Model: {{ doc.model_name }}
389 Owner: {{ doc.owner_name }}
390 Horsepower: {{ doc.hp }} HP
391 Description: {{ doc.description }}
392 {% endfor %}
393
394 Instructions:
395 - Base your answer strictly on the retrieved information
396 - If the information doesn't answer the question, say so clearly
397 - Be specific and mention relevant details from the motorcycles
398 """
399 ),
400 llm=llm,
401)
402
403
404# %%[markdown]
405## Flow Build Rag
406
407# %%
408from wayflowcore.controlconnection import ControlFlowEdge
409from wayflowcore.dataconnection import DataFlowEdge
410
411# Build the RAG flow
412complete_step = CompleteStep()
413
414steps = {
415 "start": start_step,
416 "input": user_input_step,
417 "search": search_step,
418 "respond": llm_response_step,
419 "complete": complete_step,
420}
421
422control_flow_edges = [
423 ControlFlowEdge(source_step=start_step, destination_step=user_input_step),
424 ControlFlowEdge(source_step=user_input_step, destination_step=search_step),
425 ControlFlowEdge(source_step=search_step, destination_step=llm_response_step),
426 ControlFlowEdge(source_step=llm_response_step, destination_step=complete_step),
427]
428
429data_flow_edges = [
430 # Pass user query to search step
431 DataFlowEdge(
432 source_step=user_input_step,
433 source_output=InputMessageStep.USER_PROVIDED_INPUT,
434 destination_step=search_step,
435 destination_input=SearchStep.QUERY,
436 ),
437 # Pass user query to LLM for context
438 DataFlowEdge(
439 source_step=user_input_step,
440 source_output=InputMessageStep.USER_PROVIDED_INPUT,
441 destination_step=llm_response_step,
442 destination_input="user_query",
443 ),
444 # Pass retrieved documents to LLM
445 DataFlowEdge(
446 source_step=search_step,
447 source_output=SearchStep.DOCUMENTS,
448 destination_step=llm_response_step,
449 destination_input="retrieved_documents",
450 ),
451]
452
453rag_flow = Flow(
454 begin_step=start_step,
455 steps=steps,
456 control_flow_edges=control_flow_edges,
457 data_flow_edges=data_flow_edges,
458)
459
460# Test the flow
461conversation = rag_flow.start_conversation()
462conversation.execute()
463conversation.append_user_message("Which motorcycle has the most horsepower?")
464result = conversation.execute()
465print(f"\nRAG Flow Answer: {result.output_values[PromptExecutionStep.OUTPUT]}")
466# RAG Flow Answer: Based on the retrieved information, the motorcycle with the most horsepower is the Vortex Momentum X1, which has 214 HP.
467# This Italian superbike features MotoGP-derived technology and stunning performance, indicating its high power output.
468
469# ADVANCED RAG TECHNIQUES
470
471
472# %%[markdown]
473## Advanced Filtering
474
475# %%
476# Filter search results by owner
477filtered_results = datastore.search(
478 collection_name="motorcycles", query="sport bike", k=5, where={"owner_name": "Sarah Johnson"}
479)
480
481
482# %%[markdown]
483## Advanced Multi Config
484
485# %%
486from wayflowcore.datastore import OracleDatabaseDatastore
487from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
488
489# Configure Vector Config for Search
490vector_config = VectorConfig(model=embedding_model, collection_name="motorcycles", vector_property="embeddings")
491
492# Multiple search configurations for different use cases
493precise_search = SearchConfig(
494 name="precise_search",
495 retriever=VectorRetrieverConfig(
496 model=embedding_model,
497 collection_name="motorcycles",
498 distance_metric="cosine_distance",
499 ),
500)
501
502broad_search = SearchConfig(
503 name="broad_search",
504 retriever=VectorRetrieverConfig(
505 model=embedding_model,
506 collection_name="motorcycles",
507 distance_metric="l2_distance",
508 vectors = vector_config, # You can put your vector config directly in the Vector Retriever Config
509 ),
510)
511
512# Create OracleDB datastore with multiple search configs
513multi_search_datastore = OracleDatabaseDatastore(
514 connection_config=connection_config,
515 schema={"motorcycles": motorcycles},
516 search_configs=[precise_search, broad_search],
517 vector_configs=[vector_config],
518)
519
520
521# %%[markdown]
522## Advanced Custom Toolbox
523
524# %%
525# Create specialized search toolboxes
526detailed_search = datastore.get_search_toolbox(k=10)
527quick_search = datastore.get_search_toolbox(k=1)
528
529# Agent with multiple search strategies
530advanced_agent = Agent(
531 tools=[detailed_search, quick_search],
532 llm=llm,
533 custom_instruction=dedent(
534 """
535 You are an advanced motorcycle assistant with two search modes:
536 - Use detailed search for comprehensive questions requiring multiple examples
537 - Use quick search for simple factual questions about a specific motorcycle
538
539 Choose the appropriate search mode based on the user's question.
540 """
541 ),
542)
543
544
545# %%[markdown]
546## Manual Serialization Advanced
547
548# %%
549# Advanced manual serialization that uses domain-specific, cross-field logic.
550# This goes beyond simple concatenation and cannot be reproduced with ConcatSerializerConfig,
551# which operates per-field and via string pre/post-processors without access to the full structured entity.
552from typing import Dict, Any, List
553
554def serialize_motorcycle_advanced(entity: Dict[str, Any]) -> str:
555 """
556 Produce a Markdown-formatted string with:
557 - Conditional weighting: repeat model name tokens based on horsepower bands
558 - Derived fields: performance class and hp_band computed from numeric hp
559 - Conditional keyword injection from description semantics
560 - Field re-ordering and sectioned formatting for domain salience
561 """
562 model = str(entity.get("model_name", "")).strip()
563 desc = str(entity.get("description", "")).strip()
564 owner = str(entity.get("owner_name", "")).strip()
565 try:
566 hp = int(entity.get("hp") or 0)
567 except Exception:
568 hp = 0
569
570 # Derived performance class and weighting based on hp
571 if hp >= 170:
572 performance = "track-ready superbike"
573 weight_repeats = 3
574 elif hp >= 120:
575 performance = "high-performance sport bike"
576 weight_repeats = 2
577 elif hp >= 70:
578 performance = "standard road motorcycle"
579 weight_repeats = 1
580 else:
581 performance = "lightweight commuter / trail bike"
582 weight_repeats = 1
583
584 # Keyword injection (conditional, cross-field)
585 lower_desc = desc.lower()
586 keywords: List[str] = []
587 if "race" in lower_desc or "sport" in lower_desc or hp >= 150:
588 keywords += ["sport bike", "supersport", "track-focused"]
589 if "touring" in lower_desc or "comfortable" in lower_desc or "adventure" in lower_desc:
590 keywords += ["touring", "long-distance", "comfort"]
591 if "dirt" in lower_desc or "off-road" in lower_desc or "trail" in lower_desc:
592 keywords += ["off-road", "dual-sport", "trail"]
593
594 # Deduplicate while preserving order
595 seen = set()
596 deduped_keywords: List[str] = []
597 for kw in keywords:
598 if kw not in seen:
599 deduped_keywords.append(kw)
600 seen.add(kw)
601
602 # Compose Markdown with intentional ordering and sections
603 title = f"# {model}"
604 # Token weighting via repetition (helps some embedding models emphasize salient tokens)
605 if weight_repeats > 1 and model:
606 title = title + (" " + model) * (weight_repeats - 1)
607
608 body_lines: List[str] = [
609 f"## Performance: {performance}",
610 f"hp_band: {max(0, (hp // 10) * 10)}+ HP",
611 f"owner: {owner}" if owner else "",
612 "## Description",
613 desc,
614 ]
615 if deduped_keywords:
616 body_lines += ["## Keywords", ", ".join(deduped_keywords)]
617
618 # Join non-empty lines
619 body = "\n".join([line for line in body_lines if line and line.strip()])
620
621 return f"{title}\n{body}"
622
623# Example usage (when you want to manually control embeddings):
624# for entity in motorcycle_data:
625# entity["serialized_text"] = serialize_motorcycle_advanced(entity)
626# entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
627
628
629# %%[markdown]
630## Cleanup datastore
631
632# %%
633ORACLE_DB_CLEANUP = "DROP TABLE IF EXISTS motorcycles cascade constraints"
634def cleanup_oracle_datastore():
635 connection_config = environment_config()
636 conn = connection_config.get_connection()
637 conn.cursor().execute(ORACLE_DB_CLEANUP)
638 conn.close()
639
640cleanup_oracle_datastore()