How to Build RAG-Powered Assistants#

python-icon Download Python Script

Python script/notebook for this guide.

RAG how-to script

Prerequisites

This guide assumes familiarity with

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances AI assistants by connecting them to external knowledge sources. Instead of relying solely on their training data, RAG-enabled assistants can search through your specific documents, databases, or knowledge bases to provide accurate, up-to-date, and contextually relevant responses.

In this tutorial, you will:

  • Configure vector search to enable semantic similarity matching in your data.

  • Create a searchable datastore with embeddings for efficient retrieval.

  • Create a RAG-powered Agent that autonomously searches for information to fulfill user requests.

  • Build a RAG-powered Flow using SearchStep for structured retrieval workflows.

  • Control which fields are used for embeddings to optimize search relevance.

This tutorial demonstrates RAG using Oracle Database as a persistent, production-ready vector store. You will use OracleDatabaseDatastore and Oracle AI Vector Search throughout.

Concepts shown in this guide#

Before you begin, you must connect to Oracle Database and create the table(s) needed for vector search. See below.

Step 0. OracleDatabaseDatastore: Connecting and Automated Table Preparation#

To use this guide, you should prepare an Oracle Database with vector search capability. This tutorial demonstrates an example for how you can automate the connection and table setup directly from Python. To follow this guide, you just need to have a connection to Oracle Database and should be able to perform operations on the Database.

Connection & Authentication

The code automatically detects either mTLS or simple TLS database connectivity using environment variables. The following environment variables must be set for your Oracle connection:

# For mTLS connection (Autonomous DB/Wallet)
export ADB_CONFIG_DIR=encrypted/wallet/config
export ADB_WALLET_DIR=encrypted/wallet
export ADB_WALLET_SECRET='supersecret'
export ADB_DB_USER=garage_user
export ADB_DB_PASSWORD=secret
export ADB_DSN="adb....oraclecloud.com"

# Or for TLS connection
export ADB_DB_USER=garage_user
export ADB_DB_PASSWORD=secret
export ADB_DSN="dbhost:port/servicename"

Reference: Oracle Database TLS setup guide

Warning

Using environment variables for storing sensitive connection details is not suitable for production environments.

The code will choose the most secure available connection automatically.

Table Schema Setup and DDL Execution

To be able to retrieve from you data, you need it stored in a database. We can use the Oracle Database to store the entities that will be retrieved using Oracle 23AI. To connect to it, configure the client with oracledb and specify the schema of the data. The schema for this example is:

CREATE TABLE motorcycles (
  owner_name VARCHAR2(255),
  model_name VARCHAR2(255),
  description VARCHAR2(255),
  hp INTEGER,
  serialized_text VARCHAR2(1023),
  embeddings VECTOR
);

This schema includes both a conventional text representation (serialized_text) and a VECTOR column for semantic search. The Python code handles both the creation (and dropping) of this table and the population of its data.

 1import os
 2import oracledb
 3
 4from wayflowcore.datastore.oracle import MTlsOracleDatabaseConnectionConfig, TlsOracleDatabaseConnectionConfig
 5
 6def environment_config():
 7    mtls_vars = (
 8        "ADB_CONFIG_DIR",
 9        "ADB_WALLET_DIR",
10        "ADB_WALLET_SECRET",
11        "ADB_DB_USER",
12        "ADB_DB_PASSWORD",
13        "ADB_DSN",
14    )
15    tls_vars = ("ADB_DB_USER", "ADB_DB_PASSWORD", "ADB_DSN")
16    if all(v in os.environ for v in mtls_vars):
17        return MTlsOracleDatabaseConnectionConfig(
18            config_dir=os.environ["ADB_CONFIG_DIR"],
19            wallet_location=os.environ["ADB_WALLET_DIR"],
20            wallet_password=os.environ["ADB_WALLET_SECRET"],
21            user=os.environ["ADB_DB_USER"],
22            password=os.environ["ADB_DB_PASSWORD"],
23            dsn=os.environ["ADB_DSN"],
24            id="oracle_datastore_connection_config",
25        )
26    if all(v in os.environ for v in tls_vars):
27        return TlsOracleDatabaseConnectionConfig(
28            user=os.environ["ADB_DB_USER"],
29            password=os.environ["ADB_DB_PASSWORD"],
30            dsn=os.environ["ADB_DSN"],
31            id="oracle_datastore_connection_config",
32        )
33    raise Exception("Required OracleDB environment variables not found")
34
35
36connection_config = environment_config()
37
38ORACLE_DB_DDL = """
39    CREATE TABLE motorcycles (
40    owner_name VARCHAR2(255),
41    model_name VARCHAR2(255),
42    description VARCHAR2(255),
43    hp INTEGER,
44    serialized_text VARCHAR2(1023),
45    embeddings VECTOR
46)"""
47
48with connection_config.get_connection() as conn:
49    with conn.cursor() as cursor:
50        try:
51            cursor.execute(ORACLE_DB_DDL)
52        except oracledb.DatabaseError as e:
53            print(f"DDL execution warning: {e}")

The code:

  • Detects your connection configuration (mTLS/TLS)

  • Creates the target table in Oracle using your credentials with:
    • the table fields to match the entity schema (see next section)

    • an additional serial_text TEXT field to store the string used for embeddings

    • an additional embeddings VECTOR field for the vector search

Note that if you already have a table configured with the same name, you will need to drop the table before running this code. Refer to the Cleaning Up section to see how this can be done.

Required Privileges

Make sure your user has privileges to drop, create, insert, update, and select on the target table (motorcycles).

Also make sure you have installed oracledb.

Note

You can install the required package using pip:

pip install oracledb

Setting Up RAG#

A Retrieval-Augmented Generation (RAG) system is composed of two core components: a retriever and an LLM (Large Language Model). The retriever is responsible for searching your data for relevant information, while the LLM uses the retriever as a tool to supplement its responses with up-to-date knowledge. To achieve this, we would thus need both a retriever with an embedding model and an LLM to perform end-to-end RAG.

Before creating these RAG-powered assistants, you will need to set up the data source which supports vector search capabilities.

Step 1. Configure models#

You need an embedding model for the retriever as it converts your text data into embeddings (vector representations). The retriever uses these embeddings to perform semantic searches, enabling the system to retrieve relevant information based on meaning rather than just keywords.

Configure the embedding model for vector search:

1from wayflowcore.embeddingmodels import VllmEmbeddingModel
2# Configure embedding model for vector search
3embedding_model = VllmEmbeddingModel(base_url="EMBEDDING_API_URL", model_id="model-id")

Configure your LLM:

The LLM (Large Language Model) plays two crucial roles in RAG. First, it generates a suitable retrieval query to fetch relevant information from your datastore. Then, after retrieval, the LLM formats and integrates the retrieved text into a coherent, user-facing response. Understanding the role of the LLM is key to grasping why RAG involves both retrieval and generative capabilities: retrieval brings in up-to-date, domain-specific knowledge, while the LLM ensures information is expressed in conversational form for the user.

from wayflowcore.models import OCIGenAIModel, OCIClientConfigWithApiKey

llm = OCIGenAIModel(
    model_id="provider.model-id",
    compartment_id="compartment-id",
    client_config=OCIClientConfigWithApiKey(
        service_endpoint="https://url-to-service-endpoint.com",
    ),
)

Step 2. Define searchable data#

First, define the schema for your data. Note that the collection and property names defined below should match the table and column names configured in Oracle Database (see the table we created in step 0).

 1# Define the motorcycle entity schema
 2from wayflowcore.datastore import Entity
 3from wayflowcore.property import IntegerProperty, StringProperty, VectorProperty
 4
 5motorcycles = Entity(
 6    description="Motorcycles in our garage",
 7    properties={
 8        "owner_name": StringProperty(description="Name of the motorcycle owner"),
 9        "model_name": StringProperty(description="Motorcycle model and brand"),
10        "description": StringProperty(description="Detailed description of the motorcycle"),
11        "hp": IntegerProperty(description="Horsepower of the motorcycle"),
12        "serialized_text": StringProperty(description="Concatenated string of all columns"),
13        "embeddings": VectorProperty(description="Generated embeddings for serialized_text"),
14    },
15)

Next, we configure a vector and search config for searching in this data. A few things to note:

  • If you have configured a vector index, ensure you put the same distance metric in the distance_metric parameter of the VectorRetrieverConfig. Without doing so, the approximate search will not work.

  • The embedding model passed in either the VectorConfig or the VectorRetrieverConfig should be the same as the model used to generate the corresponding embeddings column. If you specify an embedding model in both the classes, the embedding models must match.

  • You can configure the vectors parameter in the VectorRetrieverConfig to explicitly specify the vector column or VectorConfig you want to search.

  • If vectors is None, the vector column to search will be inferred by either an existing vector config with the same collection name or a vector column in the collection. If there are two or more matching vector configurations, an error will be raised.

  • If you do not specify a collection_name in the VectorConfig, the config is applicable to all collections in your datastore.

 1from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
 2
 3# Configure Vector Config for Search
 4vector_config = VectorConfig(
 5    model=embedding_model,
 6    collection_name="motorcycles",
 7    vector_property="embeddings"
 8)
 9
10# Configure vector search for semantic similarity matching
11search_config = SearchConfig(
12    name="motorcycle_search",
13    retriever=VectorRetrieverConfig(
14        model=embedding_model,
15        collection_name="motorcycles",
16        distance_metric="cosine_distance",
17    ),
18)

Then, you can create the datastore with search capability by passing the search configuration. To fill the data, we perform serialization of fields using the ConcatSerializerConfig. For each motorcycle entity, this will concatenate all fields and their values into a single string called serialized_text, which is then embedded by the model. The resulting embedding vector is assigned to the embeddings field. This approach gives you control over what text is represented in your vector index and is transparent/easy to audit.

By default, all text fields in your entities are used to generate embeddings. However, you may want to exclude certain fields like IDs, prices, or metadata from the embedding calculation while still returning them in search results. This can be achieved by configuring the columns_to_exclude parameter in ConcatSerializerConfig.

Note that datastore.create() is used here for demonstration only and is not the recommended way to load data into Oracle Database tables. For real applications, populate your tables with SQL (e.g., bulk INSERT/UPDATE), then use the Datastore APIs to index, search, and take advantage of WayFlow features.

 1from wayflowcore.datastore import OracleDatabaseDatastore
 2from wayflowcore.search.config import ConcatSerializerConfig
 3
 4# Create Oracle Database datastore with vector search capability
 5datastore = OracleDatabaseDatastore(
 6    connection_config=connection_config,
 7    schema={"motorcycles": motorcycles},
 8    search_configs=[search_config],
 9    vector_configs=[vector_config],
10)
11
12# Sample motorcycle data
13motorcycle_data = [
14    {
15        "owner_name": "John Smith",
16        "model_name": "Galaxion Thunderchief",
17        "hp": 87,
18        "description": "Classic American touring motorcycle with chrome details and comfortable seating.",
19    },
20    {
21        "owner_name": "Sarah Johnson",
22        "model_name": "Starlite Apex-R7",
23        "hp": 118,
24        "description": "High-performance supersport motorcycle designed for track racing.",
25    },
26    {
27        "owner_name": "Mike Chen",
28        "model_name": "Orion CX 1300 Helix",
29        "hp": 136,
30        "description": "Premium adventure touring motorcycle with advanced electronics.",
31    },
32    {
33        "owner_name": "Emily Davis",
34        "model_name": "Nebula Trailrunner 500",
35        "hp": 45,
36        "description": "Street-legal dirt bike perfect for off-road adventures.",
37    },
38    {
39        "owner_name": "Carlos Rodriguez",
40        "model_name": "Vortex Momentum X1",
41        "hp": 214,
42        "description": "Italian superbike with MotoGP-derived technology and stunning performance.",
43    },
44]
45# Configure Serializer to serialize columns into a string
46serializer = ConcatSerializerConfig()
47# Generate serialized_text and embeddings
48for entity in motorcycle_data:
49    entity["serialized_text"] = serializer.serialize(entity)
50    entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
51
52# Populate the OracleDB datastore
53datastore.create(collection_name="motorcycles", entities=motorcycle_data)

Create a Vector Index for Efficient Vector Search

For production semantic search, it is also recommended that you create a vector index on the embeddings field using Oracle’s HNSW (or IVF) index. Having a vector index configured is not necessary for search to work, but it will speed things up as it will use approximate search rather than using exact search. Note that if you want to use the vector index as intended, the distance metric configured in the index should be the same as the distance metric used in the VectorRetrieverConfig (you can use SimilarityMetric for simplicity). The code below creates this index programmatically and commits it to your Oracle DB. (Skip this step for in-memory datastores.)

 1import oracledb
 2
 3# Configure Vector Index
 4VECTOR_INDEX_DDL = """
 5    CREATE VECTOR INDEX hnsw_image
 6    ON motorcycles (embeddings)
 7    ORGANIZATION INMEMORY NEIGHBOR GRAPH
 8    DISTANCE COSINE
 9    WITH TARGET ACCURACY 95;
10"""
11with connection_config.get_connection() as connection:
12    with connection.cursor() as cursor:
13        try:
14            cursor.execute(VECTOR_INDEX_DDL)
15            connection.commit()
16        except oracledb.DatabaseError as e:
17            print(f"Vector Index Creation warning: {e}")

You can test the search directly:

 1# Example of direct vector search
 2results = datastore.search(
 3    collection_name="motorcycles", query="high performance sport bike for racing", k=3
 4)
 5
 6print("Direct search results:")
 7for result in results:
 8    print(f"- {result['model_name']}")
 9
10# Direct search results:
11# - Starlite Apex-R7
12# - Vortex Momentum X1
13# - Nebula Trailrunner 500

With your RAG-ready datastore in place, the next step is to use it in real applications. In WayFlow, the two primary patterns for Retrieval-Augmented Generation are:

  • Integrating RAG capabilities into conversational Agents for dynamic, dialogue-driven retrieval

  • Building Flows for more structured and predictable retrieval workflows.

In the next sections, you’ll see hands-on how to use both approaches, starting with Agents.

RAG in Agents#

We’ll start by showing how to empower your Agents with retrieval capabilities, allowing them to proactively fetch and reason over domain-specific information as part of their decision-making. Agents provide a flexible approach to RAG by autonomously deciding when and how to search for information based on the conversation context.

Step 1. Create search tools for the Agent#

Convert your searchable datastore into tools that an Agent can use:

1# Create search tools for the agent
2search_toolbox = datastore.get_search_toolbox(k=3)

The get_search_tools method creates a SearchToolBox that:

  • Dynamically generates search tools for each collection

  • Respects the k parameter to limit result count

  • Returns results as JSON for easy parsing by the LLM

Step 2. Create the RAG Agent#

Create an Agent with search capabilities:

 1from textwrap import dedent
 2from wayflowcore.agent import Agent
 3
 4# Create RAG-powered agent
 5rag_agent = Agent(
 6    tools=search_toolbox.get_tools(),
 7    llm=llm,
 8    custom_instruction=dedent(
 9        """
10        You are a helpful motorcycle garage assistant with access to our motorcycle database.
11
12        IMPORTANT:
13        - Always search for relevant information before answering questions about motorcycles
14        - Base your answers on the search results
15        - If you can't find relevant information, say so clearly
16        - Be specific and mention details from the search results
17
18        You have access to search tools that can find information about:
19        - Motorcycle models and specifications
20        - Owners of motorcycles
21        - Horsepower and performance details
22        - Descriptions and features
23        """
24    ),
25    initial_message="Hello! I'm your RAG-powered motorcycle assistant. I can search our database to answer your questions about the motorcycles in our garage.",
26)

This Agent will:

  • Automatically use search tools when it needs information

  • Combine search results with its reasoning capabilities

  • Provide accurate answers based on your specific data

Test the Agent:

1# Test the agent
2agent_conversation = rag_agent.start_conversation(messages="Who owns the Orion motorcycle?")
3status = agent_conversation.execute()
4print(f"\nAgent Answer: {status.message.content}")
5
6# Agent Answer: The Orion motorcycle is owned by Mike Chen. He owns a premium adventure touring motorcycle with advanced electronics, the Orion CX 1300 Helix, which has 136 horsepower.

The Agent autonomously decides when to search, what to search for, and how to use the results to answer questions.

RAG in Flows#

While Agents offer flexibility, Flows provide a structured approach to RAG with predictable retrieval workflows ideal for specific use cases.

Step 1. Create the RAG Flow#

Create a Flow that searches for relevant information before generating a response:

 1from textwrap import dedent
 2from wayflowcore.flow import Flow
 3from wayflowcore.steps import CompleteStep, InputMessageStep, PromptExecutionStep, StartStep
 4from wayflowcore.steps.searchstep import SearchStep
 5# Define flow steps for RAG
 6start_step = StartStep()
 7
 8user_input_step = InputMessageStep(
 9    message_template=dedent(
10        """
11        Hello! I'm your motorcycle garage assistant powered by RAG.
12
13        I have access to information about all motorcycles in our garage.
14        What would you like to know?
15        """
16    )
17)
18
19search_step = SearchStep(
20    datastore=datastore, collection_name="motorcycles", k=3, search_config="motorcycle_search"
21)
22
23llm_response_step = PromptExecutionStep(
24    prompt_template=dedent(
25        """
26        You are a knowledgeable motorcycle garage assistant.
27        Answer the user's question based ONLY on the retrieved motorcycle information.
28
29        User's question: {{ user_query }}
30
31        Retrieved motorcycle information:
32        {% for doc in retrieved_documents %}
33        - Model: {{ doc.model_name }}
34        Owner: {{ doc.owner_name }}
35        Horsepower: {{ doc.hp }} HP
36        Description: {{ doc.description }}
37        {% endfor %}
38
39        Instructions:
40        - Base your answer strictly on the retrieved information
41        - If the information doesn't answer the question, say so clearly
42        - Be specific and mention relevant details from the motorcycles
43        """
44    ),
45    llm=llm,
46)

Key points:

  • The SearchStep uses semantic search to find relevant documents based on the user’s query.

  • The k parameter limits the number of documents retrieved.

  • Retrieved documents are passed to the LLM along with the original query for contextualized responses.

Step 2. Build and test the Flow#

Build the complete Flow with control and data connections:

 1from wayflowcore.controlconnection import ControlFlowEdge
 2from wayflowcore.dataconnection import DataFlowEdge
 3
 4# Build the RAG flow
 5complete_step = CompleteStep()
 6
 7steps = {
 8    "start": start_step,
 9    "input": user_input_step,
10    "search": search_step,
11    "respond": llm_response_step,
12    "complete": complete_step,
13}
14
15control_flow_edges = [
16    ControlFlowEdge(source_step=start_step, destination_step=user_input_step),
17    ControlFlowEdge(source_step=user_input_step, destination_step=search_step),
18    ControlFlowEdge(source_step=search_step, destination_step=llm_response_step),
19    ControlFlowEdge(source_step=llm_response_step, destination_step=complete_step),
20]
21
22data_flow_edges = [
23    # Pass user query to search step
24    DataFlowEdge(
25        source_step=user_input_step,
26        source_output=InputMessageStep.USER_PROVIDED_INPUT,
27        destination_step=search_step,
28        destination_input=SearchStep.QUERY,
29    ),
30    # Pass user query to LLM for context
31    DataFlowEdge(
32        source_step=user_input_step,
33        source_output=InputMessageStep.USER_PROVIDED_INPUT,
34        destination_step=llm_response_step,
35        destination_input="user_query",
36    ),
37    # Pass retrieved documents to LLM
38    DataFlowEdge(
39        source_step=search_step,
40        source_output=SearchStep.DOCUMENTS,
41        destination_step=llm_response_step,
42        destination_input="retrieved_documents",
43    ),
44]
45
46rag_flow = Flow(
47    begin_step=start_step,
48    steps=steps,
49    control_flow_edges=control_flow_edges,
50    data_flow_edges=data_flow_edges,
51)
52
53# Test the flow
54conversation = rag_flow.start_conversation()
55conversation.execute()
56conversation.append_user_message("Which motorcycle has the most horsepower?")
57result = conversation.execute()
58print(f"\nRAG Flow Answer: {result.output_values[PromptExecutionStep.OUTPUT]}")
59# RAG Flow Answer: Based on the retrieved information, the motorcycle with the most horsepower is the Vortex Momentum X1, which has 214 HP.
60# This Italian superbike features MotoGP-derived technology and stunning performance, indicating its high power output.

The Flow provides a predictable pipeline: user input → search → response generation.

Advanced RAG Techniques#

Filtering search results#

You can filter search results based on metadata:

1# Filter search results by owner
2filtered_results = datastore.search(
3    collection_name="motorcycles", query="sport bike", k=5, where={"owner_name": "Sarah Johnson"}
4)

Multiple search configurations#

Create specialized search configurations for different use cases:

 1from wayflowcore.datastore import OracleDatabaseDatastore
 2from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
 3
 4# Configure Vector Config for Search
 5vector_config = VectorConfig(model=embedding_model, collection_name="motorcycles", vector_property="embeddings")
 6
 7# Multiple search configurations for different use cases
 8precise_search = SearchConfig(
 9    name="precise_search",
10    retriever=VectorRetrieverConfig(
11        model=embedding_model,
12        collection_name="motorcycles",
13        distance_metric="cosine_distance",
14    ),
15)
16
17broad_search = SearchConfig(
18    name="broad_search",
19    retriever=VectorRetrieverConfig(
20        model=embedding_model,
21        collection_name="motorcycles",
22        distance_metric="l2_distance",
23        vectors = vector_config, # You can put your vector config directly in the Vector Retriever Config
24    ),
25)
26
27# Create OracleDB datastore with multiple search configs
28multi_search_datastore = OracleDatabaseDatastore(
29    connection_config=connection_config,
30    schema={"motorcycles": motorcycles},
31    search_configs=[precise_search, broad_search],
32    vector_configs=[vector_config],
33)

How multiple search configs work:

  • Each SearchConfig must have a unique name (auto-generated if not provided)

  • Search configs can target the same collection with different settings (distance metrics, vector configs)

  • Search Configs can also target multiple collections if no collection name is specified, provided there does not exist another search config which matches the collection name to search on.

  • When calling Datastore.search(), you specify which config to use via the search_config parameter

  • If no search_config is specified, the system looks for a default config for that collection, given that a collection_name is specified

  • The first config that matches the collection (or has no specific collection) becomes the default

When to use each config

  • precise_search: Uses cosine similarity for semantic matching (best for meaning-based searches)

  • broad_search: Uses Euclidean distance for broader matches (considers all dimensions equally)

  • You explicitly choose which to use: datastore.search(..., search_config="precise_search")

Customizing search behavior in Agents#

Create specialized search toolboxes with different parameters:

 1# Create specialized search toolboxes
 2detailed_search = datastore.get_search_toolbox(k=10)
 3quick_search = datastore.get_search_toolbox(k=1)
 4
 5# Agent with multiple search strategies
 6advanced_agent = Agent(
 7    tools=[detailed_search, quick_search],
 8    llm=llm,
 9    custom_instruction=dedent(
10    """
11    You are an advanced motorcycle assistant with two search modes:
12    - Use detailed search for comprehensive questions requiring multiple examples
13    - Use quick search for simple factual questions about a specific motorcycle
14
15    Choose the appropriate search mode based on the user's question.
16    """
17    ),
18)

How specialized toolboxes work:

  • Each toolbox creates different search functions with fixed parameters

  • detailed_search: Always returns 10 results (k=10) for comprehensive analysis

  • quick_search: Always returns 1 result (k=1) for focused answers

  • The Agent sees these as different tools: search_motorcycles_detailed vs search_motorcycles_quick

When each toolbox is used:

  • The Agent autonomously decides based on:

    • The user’s question complexity

    • Instructions in custom_instruction

    • Context of the conversation

  • For “tell me about all sport bikes” → likely uses detailed_search

  • For “who owns the Vortex?” → likely uses quick_search

  • The Agent’s reasoning determines the choice, guided by your instructions

Manual Serialization of Fields for Embeddings#

In this example, we show a manual serialization approach that performs cross-field logic that cannot be expressed with ConcatSerializerConfig. Instead of merely concatenating fields, we: - Compute derived attributes (e.g., performance class and hp bands from numeric horsepower) - Conditionally weight salient tokens (repeat model name for high-HP bikes) - Inject domain keywords based on the description semantics - Reorder fields and output a structured, sectioned Markdown document

This goes beyond per-field preprocessing and simple separators; it uses the full entity structure at once and conditional logic across multiple fields.

 1# Advanced manual serialization that uses domain-specific, cross-field logic.
 2# This goes beyond simple concatenation and cannot be reproduced with ConcatSerializerConfig,
 3# which operates per-field and via string pre/post-processors without access to the full structured entity.
 4from typing import Dict, Any, List
 5
 6def serialize_motorcycle_advanced(entity: Dict[str, Any]) -> str:
 7    """
 8    Produce a Markdown-formatted string with:
 9    - Conditional weighting: repeat model name tokens based on horsepower bands
10    - Derived fields: performance class and hp_band computed from numeric hp
11    - Conditional keyword injection from description semantics
12    - Field re-ordering and sectioned formatting for domain salience
13    """
14    model = str(entity.get("model_name", "")).strip()
15    desc = str(entity.get("description", "")).strip()
16    owner = str(entity.get("owner_name", "")).strip()
17    try:
18        hp = int(entity.get("hp") or 0)
19    except Exception:
20        hp = 0
21
22    # Derived performance class and weighting based on hp
23    if hp >= 170:
24        performance = "track-ready superbike"
25        weight_repeats = 3
26    elif hp >= 120:
27        performance = "high-performance sport bike"
28        weight_repeats = 2
29    elif hp >= 70:
30        performance = "standard road motorcycle"
31        weight_repeats = 1
32    else:
33        performance = "lightweight commuter / trail bike"
34        weight_repeats = 1
35
36    # Keyword injection (conditional, cross-field)
37    lower_desc = desc.lower()
38    keywords: List[str] = []
39    if "race" in lower_desc or "sport" in lower_desc or hp >= 150:
40        keywords += ["sport bike", "supersport", "track-focused"]
41    if "touring" in lower_desc or "comfortable" in lower_desc or "adventure" in lower_desc:
42        keywords += ["touring", "long-distance", "comfort"]
43    if "dirt" in lower_desc or "off-road" in lower_desc or "trail" in lower_desc:
44        keywords += ["off-road", "dual-sport", "trail"]
45
46    # Deduplicate while preserving order
47    seen = set()
48    deduped_keywords: List[str] = []
49    for kw in keywords:
50        if kw not in seen:
51            deduped_keywords.append(kw)
52            seen.add(kw)
53
54    # Compose Markdown with intentional ordering and sections
55    title = f"# {model}"
56    # Token weighting via repetition (helps some embedding models emphasize salient tokens)
57    if weight_repeats > 1 and model:
58        title = title + (" " + model) * (weight_repeats - 1)
59
60    body_lines: List[str] = [
61        f"## Performance: {performance}",
62        f"hp_band: {max(0, (hp // 10) * 10)}+ HP",
63        f"owner: {owner}" if owner else "",
64        "## Description",
65        desc,
66    ]
67    if deduped_keywords:
68        body_lines += ["## Keywords", ", ".join(deduped_keywords)]
69
70    # Join non-empty lines
71    body = "\n".join([line for line in body_lines if line and line.strip()])
72
73    return f"{title}\n{body}"
74
75# Example usage (when you want to manually control embeddings):
76# for entity in motorcycle_data:
77#     entity["serialized_text"] = serialize_motorcycle_advanced(entity)
78#     entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]

Example usage when generating embeddings:

for entity in motorcycle_data:
    entity["serialized_text"] = serialize_motorcycle_advanced(entity)
    entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]

Why use explicit serialization?

  • Cross-field logic: derive fields (e.g., performance class from hp) and conditionally add keywords.

  • Conditional weighting: repeat or emphasize tokens under certain conditions (e.g., horsepower thresholds).

  • Structured formatting: generate Markdown sections and control field ordering for domain salience.

  • Auditable and deterministic: the exact text used for embeddings is transparent and reproducible.

Limitations of ConcatSerializerConfig and when to choose manual serialization:

  • ConcatSerializerConfig is powerful for per-field concatenation with simple pre/post processing and exclusion of columns.

  • It does not perform arbitrarily complex cross-field computations, conditional token weighting, or multi-field derived features.

  • Choose manual serialization whenever you need entity-level reasoning to craft the embedding text, beyond simple concatenation and formatting.

Note

Selective field embedding—using serializers to specify which fields participate in embedding generation—is best supported and straightforward in the InMemoryDatastore backend (see its API for serializer support). For OracleDatabaseDatastore, you are responsible for constructing and storing the embeddings explicitly, and there is no out-of-the-box field-level selection. For configuring the serialized text and embeddings column externally, you can make use of ConcatSerializerConfig outside the Datastore while generating the serialized text for the embeddings. OracleDatabaseDatastore assumes that the embedding column has already been generated and does not implicitly create embeddings.

Note

For rapid prototyping, use InMemoryDatastore with custom serializers for full flexibility, then migrate to OracleDatabaseDatastore for production workloads that require persistence and scalability.

Agent Spec Exporting/Loading#

You can export the agent configuration to its Agent Spec configuration using the AgentSpecExporter.

# Export the RAG agent to Agent Spec JSON
from wayflowcore.agentspec import AgentSpecExporter

rag_agent_ir_json = AgentSpecExporter().to_json(rag_agent)

Here is what the Agent Spec representation will look like ↓

Click here to see the assistant configuration.
{
    "component_type": "ExtendedAgent",
    "id": "cefee4ec-cb9d-4bc5-8361-a34860ced665",
    "name": "agent_52e70c67__auto",
    "description": "",
    "metadata": {
        "__metadata_info__": {}
    },
    "inputs": [],
    "outputs": [],
    "llm_config": {
        "component_type": "VllmConfig",
        "id": "1d26dfa9-f35f-4e21-8c30-248213ac0601",
        "name": "llm_70781625__auto",
        "description": null,
        "metadata": {
            "__metadata_info__": {}
        },
        "default_generation_parameters": {
            "max_tokens": 512
        },
        "url": "host_urls.com",
        "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct"
    },
    "system_prompt": "\nYou are a helpful motorcycle garage assistant with access to our motorcycle database.\n\nIMPORTANT:\n- Always search for relevant information before answering questions about motorcycles\n- Base your answers on the search results\n- If you can't find relevant information, say so clearly\n- Be specific and mention details from the search results\n\nYou have access to search tools that can find information about:\n- Motorcycle models and specifications\n- Owners of motorcycles\n- Horsepower and performance details\n- Descriptions and features\n",
    "tools": [
        {
            "component_type": "PluginToolFromToolBox",
            "id": "5c4bf7fb-79ba-4e3e-a671-e2e2945b7600",
            "name": "search_motorcycles",
            "description": "Search for Motorcycles in our garage in the database using semantic similarity.\n\nThis tool searches the motorcycles collection for entities that match the given query.\nIt returns exactly 3 matching records with their properties and similarity scores.\nUse this tool when you need to find information about Motorcycles in our garage.\n\nParameters\n----------\nquery : str\n    The search query string to find relevant Motorcycles in our garage.\n",
            "metadata": {
                "__metadata_info__": {}
            },
            "inputs": [],
            "outputs": [],
            "tool_name": "search_motorcycles",
            "toolbox": {
                "component_type": "PluginSearchToolBox",
                "id": "54f6a02a-9dba-480a-a9f0-4d86fff937a5",
                "name": "search_toolbox5f99358b__auto",
                "description": null,
                "metadata": {},
                "collection_names": null,
                "k": 3,
                "datastore": {
                    "component_type": "PluginOracleDatabaseDatastore",
                    "id": "de87d17c-9654-47ce-a43f-0c827e52b5f6",
                    "name": "oracle_datastoreed7b27dc__auto",
                    "description": null,
                    "metadata": {},
                    "datastore_schema": {
                        "motorcycles": {
                            "description": "Motorcycles in our garage",
                            "title": "",
                            "properties": {
                                "description": {
                                    "type": "string",
                                    "description": "Detailed description of the motorcycle"
                                },
                                "owner_name": {
                                    "description": "Name of the motorcycle owner",
                                    "type": "string"
                                },
                                "model_name": {
                                    "description": "Motorcycle model and brand",
                                    "type": "string"
                                },
                                "hp": {
                                    "description": "Horsepower of the motorcycle",
                                    "type": "integer"
                                },
                                "serialized_text": {
                                    "description": "Concatenated string of all columns",
                                    "type": "string"
                                },
                                "embeddings": {
                                    "description": "Generated embeddings for serialized_text",
                                    "type": "array",
                                    "items": {
                                        "type": "number"
                                    },
                                    "x_vector_property": true
                                }
                            }
                        }
                    },
                    "connection_config": {
                        "component_type": "PluginTlsOracleDatabaseConnectionConfig",
                        "id": "8dbd3707-cd10-44f8-bbc1-15b69ac83c14",
                        "name": "PluginTlsOracleDatabaseConnectionConfig",
                        "description": null,
                        "metadata": {},
                        "user": "user",
                        "password": "password",
                        "dsn": "dsn",
                        "config_dir": null,
                        "component_plugin_name": "DatastorePlugin",
                        "component_plugin_version": "25.4.1"
                    },
                    "search_configs": [
                        {
                            "component_type": "PluginSearchConfig",
                            "id": "c983247b-bc7a-43c3-af16-b952fa9714e5",
                            "name": "motorcycle_search",
                            "description": null,
                            "metadata": {},
                            "retriever": {
                                "component_type": "PluginVectorRetrieverConfig",
                                "id": "e26ea01e-e501-4f09-b5f4-8a96cd3daa77",
                                "name": "motorcycles",
                                "description": null,
                                "metadata": {},
                                "model": {
                                    "component_type": "PluginVllmEmbeddingConfig",
                                    "id": "fe1e8f74-cf16-4dea-8ba5-08f4629aea0a",
                                    "name": "embedding_modeledf13d6a__auto",
                                    "description": null,
                                    "metadata": {},
                                    "url": "model_url.com",
                                    "model_id": "intfloat/e5-large-v2",
                                    "component_plugin_name": "EmbeddingModelPlugin",
                                    "component_plugin_version": "25.4.1"
                                },
                                "collection_name": "motorcycles",
                                "vectors": null,
                                "distance_metric": "cosine_distance",
                                "index_params": {},
                                "component_plugin_name": "VectorRetrieverConfigPlugin",
                                "component_plugin_version": "25.4.1"
                            },
                            "component_plugin_name": "SearchConfigPlugin",
                            "component_plugin_version": "25.4.1"
                        }
                    ],
                    "vector_configs": [],
                    "component_plugin_name": "DatastorePlugin",
                    "component_plugin_version": "25.4.1"
                },
                "component_plugin_name": "SearchToolBoxPlugin",
                "component_plugin_version": "25.4.1"
            },
            "component_plugin_name": "ToolFromToolBoxPlugin",
            "component_plugin_version": "25.4.1"
        }
    ],
    "toolboxes": [],
    "context_providers": null,
    "can_finish_conversation": false,
    "max_iterations": 10,
    "initial_message": "Hello! I'm your RAG-powered motorcycle assistant. I can search our database to answer your questions about the motorcycles in our garage.",
    "caller_input_mode": "always",
    "agents": [],
    "flows": [],
    "agent_template": {
        "component_type": "PluginPromptTemplate",
        "id": "c8d2fd47-ab10-468f-9c55-c8fa2c459c1a",
        "name": "",
        "description": null,
        "metadata": {
            "__metadata_info__": {}
        },
        "messages": [
            {
                "role": "system",
                "contents": [
                    {
                        "type": "text",
                        "content": "{%- if __TOOLS__ -%}\nEnvironment: ipython\nCutting Knowledge Date: December 2023\n\nYou are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.\n\nYou have access to the following functions. To call a function, please respond with JSON for a function call.\nRespond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.\nDo not use variables.\n\n[{% for tool in __TOOLS__%}{{tool.to_openai_format() | tojson}}{{', ' if not loop.last}}{% endfor %}]\n{%- endif -%}\n"
                    }
                ],
                "tool_requests": null,
                "tool_result": null,
                "display_only": false,
                "sender": null,
                "recipients": [],
                "time_created": "2025-10-29T10:19:45.987272+00:00",
                "time_updated": "2025-10-29T10:19:45.987272+00:00"
            },
            {
                "role": "system",
                "contents": [
                    {
                        "type": "text",
                        "content": "{%- if custom_instruction -%}Additional instructions:\n{{custom_instruction}}{%- endif -%}"
                    }
                ],
                "tool_requests": null,
                "tool_result": null,
                "display_only": false,
                "sender": null,
                "recipients": [],
                "time_created": "2025-10-29T10:19:45.987302+00:00",
                "time_updated": "2025-10-29T10:19:45.987302+00:00"
            },
            {
                "role": "system",
                "contents": [
                    {
                        "type": "text",
                        "content": "$$__CHAT_HISTORY_PLACEHOLDER__$$"
                    }
                ],
                "tool_requests": null,
                "tool_result": null,
                "display_only": false,
                "sender": null,
                "recipients": [],
                "time_created": "2025-10-29T10:19:45.983942+00:00",
                "time_updated": "2025-10-29T10:19:45.983943+00:00"
            },
            {
                "role": "system",
                "contents": [
                    {
                        "type": "text",
                        "content": "{% if __PLAN__ %}The current plan you should follow is the following: \n{{__PLAN__}}{% endif %}"
                    }
                ],
                "tool_requests": null,
                "tool_result": null,
                "display_only": false,
                "sender": null,
                "recipients": [],
                "time_created": "2025-10-29T10:19:45.987326+00:00",
                "time_updated": "2025-10-29T10:19:45.987326+00:00"
            }
        ],
        "output_parser": {
            "component_type": "PluginJsonToolOutputParser",
            "id": "e5ed717a-bb76-4c13-b443-640444b98d3b",
            "name": "jsontool_outputparser",
            "description": null,
            "metadata": {
                "__metadata_info__": {}
            },
            "tools": null,
            "component_plugin_name": "OutputParserPlugin",
            "component_plugin_version": "25.4.1"
        },
        "inputs": [
            {
                "description": "\"__TOOLS__\" input variable for the template",
                "title": "__TOOLS__"
            },
            {
                "description": "\"custom_instruction\" input variable for the template",
                "type": "string",
                "title": "custom_instruction"
            },
            {
                "description": "\"__PLAN__\" input variable for the template",
                "type": "string",
                "title": "__PLAN__",
                "default": ""
            },
            {
                "type": "array",
                "items": {},
                "title": "__CHAT_HISTORY__"
            }
        ],
        "pre_rendering_transforms": null,
        "post_rendering_transforms": [
            {
                "component_type": "PluginRemoveEmptyNonUserMessageTransform",
                "id": "73631f33-9ade-420f-8cc1-775a24dd47d3",
                "name": "removeemptynonusermessage_messagetransform",
                "description": null,
                "metadata": {
                    "__metadata_info__": {}
                },
                "component_plugin_name": "MessageTransformPlugin",
                "component_plugin_version": "25.4.1"
            },
            {
                "component_type": "PluginCoalesceSystemMessagesTransform",
                "id": "9c65df01-2987-46e0-b2d1-082b79ee9a34",
                "name": "coalescesystemmessage_messagetransform",
                "description": null,
                "metadata": {
                    "__metadata_info__": {}
                },
                "component_plugin_name": "MessageTransformPlugin",
                "component_plugin_version": "25.4.1"
            },
            {
                "component_type": "PluginLlamaMergeToolRequestAndCallsTransform",
                "id": "9f3e25ea-73e9-4cee-bcbc-60b95720c023",
                "name": "llamamergetoolrequestandcalls_messagetransform",
                "description": null,
                "metadata": {
                    "__metadata_info__": {}
                },
                "component_plugin_name": "MessageTransformPlugin",
                "component_plugin_version": "25.4.1"
            }
        ],
        "tools": null,
        "native_tool_calling": false,
        "response_format": null,
        "native_structured_generation": true,
        "generation_config": null,
        "component_plugin_name": "PromptTemplatePlugin",
        "component_plugin_version": "25.4.1"
    },
    "component_plugin_name": "AgentPlugin",
    "component_plugin_version": "25.4.1",
    "agentspec_version": "25.4.1"
}

Warning

The Oracle Database Connection Config objects contain several sensitive values (like username, password, wallet location) that will not be serialized by the AgentSpecExporter. These will be serialized as references that must be resolved at loading time, by specifying the values of these sensitive fields in the component_registry argument of the loader:

component_registry = {
    # We map the ID of the sensitive fields in the connection config to their values
    "oracle_datastore_connection_config.user": "<db user>",  # Replace with your DB user
    "oracle_datastore_connection_config.password": "<db password>",  # Replace with your DB password  # nosec: this is just a placeholder
    "oracle_datastore_connection_config.dsn": "<db connection string>",  # e.g. "(description=(retry_count=2)..."
}

You can then load the configuration back to an assistant using the AgentSpecLoader.

# Load an agent from Agent Spec JSON
from wayflowcore.agentspec import AgentSpecLoader

tool_registry = {tool.name: tool for tool in search_toolbox.get_tools()}
new_rag_agent = AgentSpecLoader(tool_registry=tool_registry).load_json(rag_agent_ir_json, components_registry=component_registry)

Cleaning Up Datastore#

Before moving on, you may want to cleanup the table created in Oracle Database for this tutorial. For cleaning up, you can use the following code below. This code will drop the motorcycles from your Oracle Database using the environment_config function defined in the Setting Up section.

ORACLE_DB_CLEANUP = "DROP TABLE IF EXISTS motorcycles cascade constraints"
def cleanup_oracle_datastore():
    connection_config = environment_config()
    conn = connection_config.get_connection()
    conn.cursor().execute(ORACLE_DB_CLEANUP)
    conn.close()

cleanup_oracle_datastore()

Recap#

In this guide, you learned how to build RAG-powered assistants using WayFlow:

The key difference between Agents and Flows for RAG:

  • Agents offer dynamic, autonomous retrieval based on the conversation context - ideal when you want the AI to decide when and what to search

  • Flows provide predictable, structured retrieval workflows - ideal when you want consistent behavior for specific use cases

Key techniques covered:

  • Basic RAG: Using all fields for embeddings and search

  • Filtered search: Limiting results based on metadata

  • Multiple search configs: Different strategies for different use cases with explicit selection

  • Multiple toolboxes: Allowing Agents to choose between different search strategies autonomously

Important

Before deploying your RAG application to production, you MUST:

  1. Configure Oracle AI Vector Search for scalable vector operations

  2. Test performance with production-scale data

  3. Implement proper error handling and monitoring

For development and testing, you can use the InMemoryDataStore, the same APIs work with both datastores:

# Development (NOT for production)
datastore = InMemoryDatastore(schema={"motorcycles": motorcycles})

# Production (use this instead)
datastore = OracleDatabaseDatastore(
    connection_string="your_oracle_connection",
    schema={"motorcycles": motorcycles}
    # connection db params
)

See the OracleDatabaseDatastore guide for complete migration instructions.

Next steps#

Deployment Considerations: Now your application is backed by OracleDatabaseDatastore from the start. Your setup is production-ready, persistent, and scalable using Oracle AI Vector Search.

  • Always test with your own database connection and schema for production.

  • Ensure your Oracle user has all necessary table privileges.

  • For advanced vector functionality, see the OracleDatabaseDatastore API guide.

Full code#

Click on the card at the top of this page to download the full code for this guide or copy the code below.

  1# Copyright © 2025 Oracle and/or its affiliates.
  2#
  3# This software is under the Apache License 2.0
  4# %%[markdown]
  5# WayFlow Code Example - How to build RAG-Powered Assistants
  6# ----------------------------------------------------------
  7
  8# How to use:
  9# Create a new Python virtual environment and install the latest WayFlow version.
 10# ```bash
 11# python -m venv venv-wayflowcore
 12# source venv-wayflowcore/bin/activate
 13# pip install --upgrade pip
 14# pip install "wayflowcore==26.2.0.dev0" 
 15# ```
 16
 17# You can now run the script
 18# 1. As a Python file:
 19# ```bash
 20# python howto_rag.py
 21# ```
 22# 2. As a Notebook (in VSCode):
 23# When viewing the file,
 24#  - press the keys Ctrl + Enter to run the selected cell
 25#  - or Shift + Enter to run the selected cell and move to the cell below# (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
 26# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.
 27
 28
 29
 30
 31
 32# %%[markdown]
 33## Embedding-config
 34
 35# %%
 36from wayflowcore.embeddingmodels import VllmEmbeddingModel
 37# Configure embedding model for vector search
 38embedding_model = VllmEmbeddingModel(base_url="EMBEDDING_API_URL", model_id="model-id")
 39
 40
 41
 42# %%[markdown]
 43## Llm-config
 44
 45# %%
 46# Configure LLM
 47from wayflowcore.models import VllmModel
 48
 49llm = VllmModel(
 50    model_id="model-id",
 51    host_port="VLLM_HOST_PORT",
 52)
 53
 54
 55
 56# %%[markdown]
 57## Entity-define
 58
 59# %%
 60# Define the motorcycle entity schema
 61from wayflowcore.datastore import Entity
 62from wayflowcore.property import IntegerProperty, StringProperty, VectorProperty
 63
 64motorcycles = Entity(
 65    description="Motorcycles in our garage",
 66    properties={
 67        "owner_name": StringProperty(description="Name of the motorcycle owner"),
 68        "model_name": StringProperty(description="Motorcycle model and brand"),
 69        "description": StringProperty(description="Detailed description of the motorcycle"),
 70        "hp": IntegerProperty(description="Horsepower of the motorcycle"),
 71        "serialized_text": StringProperty(description="Concatenated string of all columns"),
 72        "embeddings": VectorProperty(description="Generated embeddings for serialized_text"),
 73    },
 74)
 75
 76
 77
 78# %%[markdown]
 79## Search-config
 80
 81# %%
 82from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
 83
 84# Configure Vector Config for Search
 85vector_config = VectorConfig(
 86    model=embedding_model,
 87    collection_name="motorcycles",
 88    vector_property="embeddings"
 89)
 90
 91# Configure vector search for semantic similarity matching
 92search_config = SearchConfig(
 93    name="motorcycle_search",
 94    retriever=VectorRetrieverConfig(
 95        model=embedding_model,
 96        collection_name="motorcycles",
 97        distance_metric="cosine_distance",
 98    ),
 99)
100
101
102
103# %%[markdown]
104## Oracle-connection
105
106# %%
107import os
108import oracledb
109
110from wayflowcore.datastore.oracle import MTlsOracleDatabaseConnectionConfig, TlsOracleDatabaseConnectionConfig
111
112def environment_config():
113    mtls_vars = (
114        "ADB_CONFIG_DIR",
115        "ADB_WALLET_DIR",
116        "ADB_WALLET_SECRET",
117        "ADB_DB_USER",
118        "ADB_DB_PASSWORD",
119        "ADB_DSN",
120    )
121    tls_vars = ("ADB_DB_USER", "ADB_DB_PASSWORD", "ADB_DSN")
122    if all(v in os.environ for v in mtls_vars):
123        return MTlsOracleDatabaseConnectionConfig(
124            config_dir=os.environ["ADB_CONFIG_DIR"],
125            wallet_location=os.environ["ADB_WALLET_DIR"],
126            wallet_password=os.environ["ADB_WALLET_SECRET"],
127            user=os.environ["ADB_DB_USER"],
128            password=os.environ["ADB_DB_PASSWORD"],
129            dsn=os.environ["ADB_DSN"],
130            id="oracle_datastore_connection_config",
131        )
132    if all(v in os.environ for v in tls_vars):
133        return TlsOracleDatabaseConnectionConfig(
134            user=os.environ["ADB_DB_USER"],
135            password=os.environ["ADB_DB_PASSWORD"],
136            dsn=os.environ["ADB_DSN"],
137            id="oracle_datastore_connection_config",
138        )
139    raise Exception("Required OracleDB environment variables not found")
140
141
142connection_config = environment_config()
143
144ORACLE_DB_DDL = """
145    CREATE TABLE motorcycles (
146    owner_name VARCHAR2(255),
147    model_name VARCHAR2(255),
148    description VARCHAR2(255),
149    hp INTEGER,
150    serialized_text VARCHAR2(1023),
151    embeddings VECTOR
152)"""
153
154with connection_config.get_connection() as conn:
155    with conn.cursor() as cursor:
156        try:
157            cursor.execute(ORACLE_DB_DDL)
158        except oracledb.DatabaseError as e:
159            print(f"DDL execution warning: {e}")
160
161
162# %%[markdown]
163## Datastore-create-rag
164
165# %%
166from wayflowcore.datastore import OracleDatabaseDatastore
167from wayflowcore.search.config import ConcatSerializerConfig
168
169# Create Oracle Database datastore with vector search capability
170datastore = OracleDatabaseDatastore(
171    connection_config=connection_config,
172    schema={"motorcycles": motorcycles},
173    search_configs=[search_config],
174    vector_configs=[vector_config],
175)
176
177# Sample motorcycle data
178motorcycle_data = [
179    {
180        "owner_name": "John Smith",
181        "model_name": "Galaxion Thunderchief",
182        "hp": 87,
183        "description": "Classic American touring motorcycle with chrome details and comfortable seating.",
184    },
185    {
186        "owner_name": "Sarah Johnson",
187        "model_name": "Starlite Apex-R7",
188        "hp": 118,
189        "description": "High-performance supersport motorcycle designed for track racing.",
190    },
191    {
192        "owner_name": "Mike Chen",
193        "model_name": "Orion CX 1300 Helix",
194        "hp": 136,
195        "description": "Premium adventure touring motorcycle with advanced electronics.",
196    },
197    {
198        "owner_name": "Emily Davis",
199        "model_name": "Nebula Trailrunner 500",
200        "hp": 45,
201        "description": "Street-legal dirt bike perfect for off-road adventures.",
202    },
203    {
204        "owner_name": "Carlos Rodriguez",
205        "model_name": "Vortex Momentum X1",
206        "hp": 214,
207        "description": "Italian superbike with MotoGP-derived technology and stunning performance.",
208    },
209]
210# Configure Serializer to serialize columns into a string
211serializer = ConcatSerializerConfig()
212# Generate serialized_text and embeddings
213for entity in motorcycle_data:
214    entity["serialized_text"] = serializer.serialize(entity)
215    entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
216
217# Populate the OracleDB datastore
218datastore.create(collection_name="motorcycles", entities=motorcycle_data)
219
220
221# %%[markdown]
222## Create-vector-index
223
224# %%
225import oracledb
226
227# Configure Vector Index
228VECTOR_INDEX_DDL = """
229    CREATE VECTOR INDEX hnsw_image
230    ON motorcycles (embeddings)
231    ORGANIZATION INMEMORY NEIGHBOR GRAPH
232    DISTANCE COSINE
233    WITH TARGET ACCURACY 95;
234"""
235with connection_config.get_connection() as connection:
236    with connection.cursor() as cursor:
237        try:
238            cursor.execute(VECTOR_INDEX_DDL)
239            connection.commit()
240        except oracledb.DatabaseError as e:
241            print(f"Vector Index Creation warning: {e}")
242
243
244# %%[markdown]
245## Direct-search-example
246
247# %%
248# Example of direct vector search
249results = datastore.search(
250    collection_name="motorcycles", query="high performance sport bike for racing", k=3
251)
252
253print("Direct search results:")
254for result in results:
255    print(f"- {result['model_name']}")
256
257# Direct search results:
258# - Starlite Apex-R7
259# - Vortex Momentum X1
260# - Nebula Trailrunner 500
261
262# RAG AGENT IMPLEMENTATION
263
264
265# %%[markdown]
266## Agent Tools Rag
267
268# %%
269# Create search tools for the agent
270search_toolbox = datastore.get_search_toolbox(k=3)
271
272
273
274# %%[markdown]
275## Agent Create Rag
276
277# %%
278from textwrap import dedent
279from wayflowcore.agent import Agent
280
281# Create RAG-powered agent
282rag_agent = Agent(
283    tools=search_toolbox.get_tools(),
284    llm=llm,
285    custom_instruction=dedent(
286        """
287        You are a helpful motorcycle garage assistant with access to our motorcycle database.
288
289        IMPORTANT:
290        - Always search for relevant information before answering questions about motorcycles
291        - Base your answers on the search results
292        - If you can't find relevant information, say so clearly
293        - Be specific and mention details from the search results
294
295        You have access to search tools that can find information about:
296        - Motorcycle models and specifications
297        - Owners of motorcycles
298        - Horsepower and performance details
299        - Descriptions and features
300        """
301    ),
302    initial_message="Hello! I'm your RAG-powered motorcycle assistant. I can search our database to answer your questions about the motorcycles in our garage.",
303)
304
305
306
307# %%[markdown]
308## Agent Test Rag
309
310# %%
311# Test the agent
312agent_conversation = rag_agent.start_conversation(messages="Who owns the Orion motorcycle?")
313status = agent_conversation.execute()
314print(f"\nAgent Answer: {status.message.content}")
315
316# Agent Answer: The Orion motorcycle is owned by Mike Chen. He owns a premium adventure touring motorcycle with advanced electronics, the Orion CX 1300 Helix, which has 136 horsepower.
317
318
319# %%[markdown]
320## Export Config to Agent Spec
321
322# %%
323# Export the RAG agent to Agent Spec JSON
324from wayflowcore.agentspec import AgentSpecExporter
325
326rag_agent_ir_json = AgentSpecExporter().to_json(rag_agent)
327
328# %%[markdown]
329## Provide sensitive information when loading the Agent Spec config
330
331# %%
332component_registry = {
333    # We map the ID of the sensitive fields in the connection config to their values
334    "oracle_datastore_connection_config.user": "<db user>",  # Replace with your DB user
335    "oracle_datastore_connection_config.password": "<db password>",  # Replace with your DB password  # nosec: this is just a placeholder
336    "oracle_datastore_connection_config.dsn": "<db connection string>",  # e.g. "(description=(retry_count=2)..."
337}
338
339# %%[markdown]
340## Load Agent Spec Config
341
342# %%
343# Load an agent from Agent Spec JSON
344from wayflowcore.agentspec import AgentSpecLoader
345
346tool_registry = {tool.name: tool for tool in search_toolbox.get_tools()}
347new_rag_agent = AgentSpecLoader(tool_registry=tool_registry).load_json(rag_agent_ir_json, components_registry=component_registry)
348
349# RAG FLOW IMPLEMENTATION
350
351
352# %%[markdown]
353## Flow Steps Rag
354
355# %%
356from textwrap import dedent
357from wayflowcore.flow import Flow
358from wayflowcore.steps import CompleteStep, InputMessageStep, PromptExecutionStep, StartStep
359from wayflowcore.steps.searchstep import SearchStep
360# Define flow steps for RAG
361start_step = StartStep()
362
363user_input_step = InputMessageStep(
364    message_template=dedent(
365        """
366        Hello! I'm your motorcycle garage assistant powered by RAG.
367
368        I have access to information about all motorcycles in our garage.
369        What would you like to know?
370        """
371    )
372)
373
374search_step = SearchStep(
375    datastore=datastore, collection_name="motorcycles", k=3, search_config="motorcycle_search"
376)
377
378llm_response_step = PromptExecutionStep(
379    prompt_template=dedent(
380        """
381        You are a knowledgeable motorcycle garage assistant.
382        Answer the user's question based ONLY on the retrieved motorcycle information.
383
384        User's question: {{ user_query }}
385
386        Retrieved motorcycle information:
387        {% for doc in retrieved_documents %}
388        - Model: {{ doc.model_name }}
389        Owner: {{ doc.owner_name }}
390        Horsepower: {{ doc.hp }} HP
391        Description: {{ doc.description }}
392        {% endfor %}
393
394        Instructions:
395        - Base your answer strictly on the retrieved information
396        - If the information doesn't answer the question, say so clearly
397        - Be specific and mention relevant details from the motorcycles
398        """
399    ),
400    llm=llm,
401)
402
403
404# %%[markdown]
405## Flow Build Rag
406
407# %%
408from wayflowcore.controlconnection import ControlFlowEdge
409from wayflowcore.dataconnection import DataFlowEdge
410
411# Build the RAG flow
412complete_step = CompleteStep()
413
414steps = {
415    "start": start_step,
416    "input": user_input_step,
417    "search": search_step,
418    "respond": llm_response_step,
419    "complete": complete_step,
420}
421
422control_flow_edges = [
423    ControlFlowEdge(source_step=start_step, destination_step=user_input_step),
424    ControlFlowEdge(source_step=user_input_step, destination_step=search_step),
425    ControlFlowEdge(source_step=search_step, destination_step=llm_response_step),
426    ControlFlowEdge(source_step=llm_response_step, destination_step=complete_step),
427]
428
429data_flow_edges = [
430    # Pass user query to search step
431    DataFlowEdge(
432        source_step=user_input_step,
433        source_output=InputMessageStep.USER_PROVIDED_INPUT,
434        destination_step=search_step,
435        destination_input=SearchStep.QUERY,
436    ),
437    # Pass user query to LLM for context
438    DataFlowEdge(
439        source_step=user_input_step,
440        source_output=InputMessageStep.USER_PROVIDED_INPUT,
441        destination_step=llm_response_step,
442        destination_input="user_query",
443    ),
444    # Pass retrieved documents to LLM
445    DataFlowEdge(
446        source_step=search_step,
447        source_output=SearchStep.DOCUMENTS,
448        destination_step=llm_response_step,
449        destination_input="retrieved_documents",
450    ),
451]
452
453rag_flow = Flow(
454    begin_step=start_step,
455    steps=steps,
456    control_flow_edges=control_flow_edges,
457    data_flow_edges=data_flow_edges,
458)
459
460# Test the flow
461conversation = rag_flow.start_conversation()
462conversation.execute()
463conversation.append_user_message("Which motorcycle has the most horsepower?")
464result = conversation.execute()
465print(f"\nRAG Flow Answer: {result.output_values[PromptExecutionStep.OUTPUT]}")
466# RAG Flow Answer: Based on the retrieved information, the motorcycle with the most horsepower is the Vortex Momentum X1, which has 214 HP.
467# This Italian superbike features MotoGP-derived technology and stunning performance, indicating its high power output.
468
469# ADVANCED RAG TECHNIQUES
470
471
472# %%[markdown]
473## Advanced Filtering
474
475# %%
476# Filter search results by owner
477filtered_results = datastore.search(
478    collection_name="motorcycles", query="sport bike", k=5, where={"owner_name": "Sarah Johnson"}
479)
480
481
482# %%[markdown]
483## Advanced Multi Config
484
485# %%
486from wayflowcore.datastore import OracleDatabaseDatastore
487from wayflowcore.search import SearchConfig, VectorRetrieverConfig, VectorConfig
488
489# Configure Vector Config for Search
490vector_config = VectorConfig(model=embedding_model, collection_name="motorcycles", vector_property="embeddings")
491
492# Multiple search configurations for different use cases
493precise_search = SearchConfig(
494    name="precise_search",
495    retriever=VectorRetrieverConfig(
496        model=embedding_model,
497        collection_name="motorcycles",
498        distance_metric="cosine_distance",
499    ),
500)
501
502broad_search = SearchConfig(
503    name="broad_search",
504    retriever=VectorRetrieverConfig(
505        model=embedding_model,
506        collection_name="motorcycles",
507        distance_metric="l2_distance",
508        vectors = vector_config, # You can put your vector config directly in the Vector Retriever Config
509    ),
510)
511
512# Create OracleDB datastore with multiple search configs
513multi_search_datastore = OracleDatabaseDatastore(
514    connection_config=connection_config,
515    schema={"motorcycles": motorcycles},
516    search_configs=[precise_search, broad_search],
517    vector_configs=[vector_config],
518)
519
520
521# %%[markdown]
522## Advanced Custom Toolbox
523
524# %%
525# Create specialized search toolboxes
526detailed_search = datastore.get_search_toolbox(k=10)
527quick_search = datastore.get_search_toolbox(k=1)
528
529# Agent with multiple search strategies
530advanced_agent = Agent(
531    tools=[detailed_search, quick_search],
532    llm=llm,
533    custom_instruction=dedent(
534    """
535    You are an advanced motorcycle assistant with two search modes:
536    - Use detailed search for comprehensive questions requiring multiple examples
537    - Use quick search for simple factual questions about a specific motorcycle
538
539    Choose the appropriate search mode based on the user's question.
540    """
541    ),
542)
543
544
545# %%[markdown]
546## Manual Serialization Advanced
547
548# %%
549# Advanced manual serialization that uses domain-specific, cross-field logic.
550# This goes beyond simple concatenation and cannot be reproduced with ConcatSerializerConfig,
551# which operates per-field and via string pre/post-processors without access to the full structured entity.
552from typing import Dict, Any, List
553
554def serialize_motorcycle_advanced(entity: Dict[str, Any]) -> str:
555    """
556    Produce a Markdown-formatted string with:
557    - Conditional weighting: repeat model name tokens based on horsepower bands
558    - Derived fields: performance class and hp_band computed from numeric hp
559    - Conditional keyword injection from description semantics
560    - Field re-ordering and sectioned formatting for domain salience
561    """
562    model = str(entity.get("model_name", "")).strip()
563    desc = str(entity.get("description", "")).strip()
564    owner = str(entity.get("owner_name", "")).strip()
565    try:
566        hp = int(entity.get("hp") or 0)
567    except Exception:
568        hp = 0
569
570    # Derived performance class and weighting based on hp
571    if hp >= 170:
572        performance = "track-ready superbike"
573        weight_repeats = 3
574    elif hp >= 120:
575        performance = "high-performance sport bike"
576        weight_repeats = 2
577    elif hp >= 70:
578        performance = "standard road motorcycle"
579        weight_repeats = 1
580    else:
581        performance = "lightweight commuter / trail bike"
582        weight_repeats = 1
583
584    # Keyword injection (conditional, cross-field)
585    lower_desc = desc.lower()
586    keywords: List[str] = []
587    if "race" in lower_desc or "sport" in lower_desc or hp >= 150:
588        keywords += ["sport bike", "supersport", "track-focused"]
589    if "touring" in lower_desc or "comfortable" in lower_desc or "adventure" in lower_desc:
590        keywords += ["touring", "long-distance", "comfort"]
591    if "dirt" in lower_desc or "off-road" in lower_desc or "trail" in lower_desc:
592        keywords += ["off-road", "dual-sport", "trail"]
593
594    # Deduplicate while preserving order
595    seen = set()
596    deduped_keywords: List[str] = []
597    for kw in keywords:
598        if kw not in seen:
599            deduped_keywords.append(kw)
600            seen.add(kw)
601
602    # Compose Markdown with intentional ordering and sections
603    title = f"# {model}"
604    # Token weighting via repetition (helps some embedding models emphasize salient tokens)
605    if weight_repeats > 1 and model:
606        title = title + (" " + model) * (weight_repeats - 1)
607
608    body_lines: List[str] = [
609        f"## Performance: {performance}",
610        f"hp_band: {max(0, (hp // 10) * 10)}+ HP",
611        f"owner: {owner}" if owner else "",
612        "## Description",
613        desc,
614    ]
615    if deduped_keywords:
616        body_lines += ["## Keywords", ", ".join(deduped_keywords)]
617
618    # Join non-empty lines
619    body = "\n".join([line for line in body_lines if line and line.strip()])
620
621    return f"{title}\n{body}"
622
623# Example usage (when you want to manually control embeddings):
624# for entity in motorcycle_data:
625#     entity["serialized_text"] = serialize_motorcycle_advanced(entity)
626#     entity["embeddings"] = embedding_model.embed([entity["serialized_text"]])[0]
627
628
629# %%[markdown]
630## Cleanup datastore
631
632# %%
633ORACLE_DB_CLEANUP = "DROP TABLE IF EXISTS motorcycles cascade constraints"
634def cleanup_oracle_datastore():
635    connection_config = environment_config()
636    conn = connection_config.get_connection()
637    conn.cursor().execute(ORACLE_DB_CLEANUP)
638    conn.close()
639
640cleanup_oracle_datastore()