1. `SyntheticDataAttributes`¶

Attributes to control generation of synthetic data

Parameters:

object_name (str) – Table name to populate synthetic data
object_list (List[Mapping]) – Use this to generate synthetic data on multiple tables
owner_name (str) – Database user who owns the referenced object. Default value is connected user’s schema
record_count (int) – Number of records to generate
user_prompt (str) – User prompt to guide generation of synthetic data For e.g. “the release date for the movies should be in 2019”

2. `SyntheticDataParams`¶

class select_ai.SyntheticDataParams(sample_rows: int | None = None, table_statistics: bool | None = False, priority: str | None = 'HIGH', comments: bool | None = False)¶

Optional parameters to control generation of synthetic data

Parameters:

sample_rows (int) – number of rows from the table to use as a sample to guide the LLM in data generation
table_statistics (bool) – Enable or disable the use of table statistics information. Default value is False
priority (str) – Assign a priority value that defines the number of parallel requests sent to the LLM for generating synthetic data. Tasks with a higher priority will consume more database resources and complete faster. Possible values are: HIGH, MEDIUM, LOW
comments (bool) – Enable or disable sending comments to the LLM to guide data generation. Default value is False

Also, check the generate_synthetic_data PL/SQL API for attribute details

3. Single table synthetic data¶

The below example shows single table synthetic data generation

3.1. Sync API¶

import os

import select_ai

user = os.getenv("SELECT_AI_USER")
password = os.getenv("SELECT_AI_PASSWORD")
dsn = os.getenv("SELECT_AI_DB_CONNECT_STRING")

select_ai.connect(user=user, password=password, dsn=dsn)
profile = select_ai.Profile(profile_name="oci_ai_profile")
synthetic_data_params = select_ai.SyntheticDataParams(
    sample_rows=100, table_statistics=True, priority="HIGH"
)
synthetic_data_attributes = select_ai.SyntheticDataAttributes(
    object_name="MOVIE",
    user_prompt="the release date for the movies should be in 2019",
    params=synthetic_data_params,
    record_count=100,
)
profile.generate_synthetic_data(
    synthetic_data_attributes=synthetic_data_attributes
)

output:

SQL> select count(*) from movie;

  COUNT(*)
----------
       100

3.2. Async API¶

import asyncio
import os

import select_ai

user = os.getenv("SELECT_AI_USER")
password = os.getenv("SELECT_AI_PASSWORD")
dsn = os.getenv("SELECT_AI_DB_CONNECT_STRING")


async def main():
    await select_ai.async_connect(user=user, password=password, dsn=dsn)
    async_profile = await select_ai.AsyncProfile(
        profile_name="async_oci_ai_profile",
    )
    synthetic_data_params = select_ai.SyntheticDataParams(
        sample_rows=100, table_statistics=True, priority="HIGH"
    )
    synthetic_data_attributes = select_ai.SyntheticDataAttributes(
        object_name="MOVIE",
        user_prompt="the release date for the movies should be in 2019",
        params=synthetic_data_params,
        record_count=100,
    )
    await async_profile.generate_synthetic_data(
        synthetic_data_attributes=synthetic_data_attributes
    )


asyncio.run(main())

output:

SQL> select count(*) from movie;

  COUNT(*)
----------
       100

4. Multi table synthetic data¶

The below example shows multitable synthetic data generation

4.1. Sync API¶

import os

import select_ai

user = os.getenv("SELECT_AI_USER")
password = os.getenv("SELECT_AI_PASSWORD")
dsn = os.getenv("SELECT_AI_DB_CONNECT_STRING")

select_ai.connect(user=user, password=password, dsn=dsn)
profile = select_ai.Profile(profile_name="oci_ai_profile")
synthetic_data_params = select_ai.SyntheticDataParams(
    sample_rows=100, table_statistics=True, priority="HIGH"
)
object_list = [
    {
        "owner": user,
        "name": "MOVIE",
        "record_count": 100,
        "user_prompt": "the release date for the movies should be in 2019",
    },
    {"owner": user, "name": "ACTOR", "record_count": 10},
    {"owner": user, "name": "DIRECTOR", "record_count": 5},
]
synthetic_data_attributes = select_ai.SyntheticDataAttributes(
    object_list=object_list, params=synthetic_data_params
)
profile.generate_synthetic_data(
    synthetic_data_attributes=synthetic_data_attributes
)

output:

SQL> select count(*) from actor;

  COUNT(*)
----------
    40

SQL> select count(*) from director;

  COUNT(*)
----------
    13

SQL> select count(*) from movie;

  COUNT(*)
----------
       300

4.2. Async API¶

import asyncio
import os

import select_ai

user = os.getenv("SELECT_AI_USER")
password = os.getenv("SELECT_AI_PASSWORD")
dsn = os.getenv("SELECT_AI_DB_CONNECT_STRING")


async def main():
    await select_ai.async_connect(user=user, password=password, dsn=dsn)
    async_profile = await select_ai.AsyncProfile(
        profile_name="async_oci_ai_profile",
    )
    synthetic_data_params = select_ai.SyntheticDataParams(
        sample_rows=100, table_statistics=True, priority="HIGH"
    )
    object_list = [
        {
            "owner": user,
            "name": "MOVIE",
            "record_count": 100,
            "user_prompt": "the release date for the movies should be in 2019",
        },
        {"owner": user, "name": "ACTOR", "record_count": 10},
        {"owner": user, "name": "DIRECTOR", "record_count": 5},
    ]
    synthetic_data_attributes = select_ai.SyntheticDataAttributes(
        object_list=object_list, params=synthetic_data_params
    )
    await async_profile.generate_synthetic_data(
        synthetic_data_attributes=synthetic_data_attributes
    )


asyncio.run(main())

output:

SQL> select count(*) from actor;

  COUNT(*)
----------
    40

SQL> select count(*) from director;

  COUNT(*)
----------
    13

SQL> select count(*) from movie;

  COUNT(*)
----------
       300

1. `SyntheticDataAttributes`¶

2. `SyntheticDataParams`¶

3. Single table synthetic data¶

3.1. Sync API¶

3.2. Async API¶

4. Multi table synthetic data¶

4.1. Sync API¶

4.2. Async API¶

Table of Contents

Previous topic

Next topic

This Page

1. SyntheticDataAttributes¶

2. SyntheticDataParams¶

3. Single table synthetic data¶

3.1. Sync API¶

3.2. Async API¶

4. Multi table synthetic data¶

4.1. Sync API¶

4.2. Async API¶

1. `SyntheticDataAttributes`¶

2. `SyntheticDataParams`¶