Object Storage Embedding
Details
Overview
Creating a vector store from documents stored in OCI Object Storage is a two-step API workflow:
- Download objects from an OCI bucket to the server’s temporary staging area.
- Embed the downloaded files into a new vector store.
This separation is intentional — you can accumulate files from multiple downloads (or mix in files from other sources like local uploads) before triggering the embed step.
Step 1: Download Objects from OCI Object Storage
Download one or more objects from an OCI Object Storage bucket to the server’s staging directory.
Endpoint: POST /v1/oci/objects/download/{bucket_name}/{auth_profile}
| Parameter | Location | Description |
|---|---|---|
bucket_name | Path | Name of the OCI Object Storage bucket |
auth_profile | Path | OCI profile name (case-insensitive), as configured on the server |
client | Header | Client identifier for scoping temp storage (default: server) |
| Request body | Body | JSON array of object key strings to download |
Response: JSON array of downloaded filenames.
Example
You can call this endpoint multiple times to accumulate files from the same or different buckets before proceeding to Step 2.
Step 2: Create and Populate the Vector Store
Process all staged files — splitting them into chunks, generating embeddings, and populating the vector store.
Endpoint: POST /v1/embed
| Parameter | Location | Description |
|---|---|---|
rate_limit | Query | Embedding API rate limit in requests per minute (default: 0 for unlimited) |
client | Header | Must match the client value used in Step 1 |
| Request body | Body | VectorStoreConfig JSON object (see below) |
VectorStoreConfig Fields
| Field | Type | Description |
|---|---|---|
alias | string | Identifiable alias for the vector store |
description | string | Human-readable description of the table contents |
embedding_model | object | {"provider": "...", "id": "..."} — the embedding model to use |
chunk_size | integer | Maximum chunk size in characters (0 for default) |
chunk_overlap | integer | Overlap between chunks in characters (0 for default) |
distance_strategy | string | One of: COSINE, EUCLIDEAN_DISTANCE, DOT_PRODUCT |
index_type | string | Vector index type: HNSW, IVF, or HYB |
parsing_mode | string | Document parsing mode: fast or deep |
Response: EmbedProcessingResult JSON object:
| Field | Type | Description |
|---|---|---|
message | string | Status message |
total_chunks | integer | Number of chunks created |
processed_files | array | List of successfully processed files |
skipped_files | array | List of files that were skipped |
Example
Complete Example
A full end-to-end workflow downloading from two buckets and embedding:
Notes
- File cleanup: Staged files are automatically cleaned up after the embed endpoint completes, whether it succeeds or fails.
- Mixing sources: Files from multiple sources can be accumulated before embedding. In addition to OCI Object Storage downloads, you can upload local files via
POST /v1/embed/local/storeor scrape web content — all files are staged in the same directory scoped by theclientheader. - Client scoping: The
clientheader isolates temporary storage between different sessions. Use a consistent value across your download and embed calls within a single workflow.