Observability

Details
This page shows version v0.0.0 (dev). The current version can be found here.

The AI Optimizer Server can emit OpenTelemetry traces and logs to any OTLP-compatible backend (e.g. SigNoz, Jaeger, Grafana Tempo). Telemetry is opt-in โ€” disabled by default and activated entirely via environment variables.

What Is Instrumented

Telemetry covers HTTP traffic, LangChain/LangGraph orchestration, LLM invocations, and application logs on the server:

SourceSignalWhat’s Captured
FastAPITraceOne SERVER span per inbound HTTP request, with route, method, status code, peer info
LangChain / LangGraphTraceOne span per chain, agent, tool, retriever, or graph node invocation. LLM calls (including those routed via langchain-litellm) carry semantic attributes โ€” model name, prompt, response, prompt/completion token counts
httpxTraceOne CLIENT span per outbound call (LLM provider APIs, MCP, etc.)
requestsTraceOne CLIENT span per outbound call (used by the OCI SDK)
Python loggingLogAll log records emitted by application code, uvicorn, and dependencies, automatically correlated to the active trace and span

A typical chat request produces a tree like:

POST /v1/chat                            SERVER
โ””โ”€โ”€ LangGraph invocation                 INTERNAL
    โ”œโ”€โ”€ retrieve node                    INTERNAL
    โ””โ”€โ”€ generate node                    INTERNAL
        โ””โ”€โ”€ ChatLiteLLM.invoke           INTERNAL  โ† model, tokens, prompt/response
            โ””โ”€โ”€ HTTP POST <provider>     CLIENT    โ† transport timing

The LangChain LLM span carries semantic LLM information (model, tokens, content); the child httpx span carries transport-level information (URL, status, duration). They are complementary, not duplicates.

Log records emitted during the lifetime of any span carry that span’s trace_id and span_id, so a backend can show the logs from a specific span when its trace is opened.

The Streamlit Client is not instrumented; it is a thin REST client whose work is reflected in the server-side traces it triggers.

Enabling

1. Install the otel extra

The OpenTelemetry packages live in an optional dependency group. They are not installed by default.

pip install -e ".[server,otel]"

For container builds, include otel in the extras passed to your install step.

2. Configure the exporter

Two paths, chosen with OTEL_TRACES_EXPORTER:

Backend export (production)

Point the server at any OTLP receiver:

# In .env.dev (or .env.prd, etc.)
OTEL_EXPORTER_OTLP_ENDPOINT=http://signoz-otel-collector:4317

The default protocol is gRPC (port 4317). For HTTP/protobuf (port 4318):

OTEL_EXPORTER_OTLP_ENDPOINT=http://signoz-otel-collector:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

Console export (debugging, no backend required)

Dumps each span as JSON to stdout. Useful for verifying instrumentation locally without standing up a backend.

OTEL_TRACES_EXPORTER=console
Console exporter cost

The console exporter flushes synchronously per span and can produce significant log volume. Use it for local debugging only; do not enable it in production.

3. Start the server and verify

./src/entrypoint.py server
curl http://localhost:8000/v1/healthz

In console mode, span JSON is printed to stdout immediately. In backend mode, the server logs OTel telemetry initialized: service=ai-optimizer-server exporters=['otlp'] at startup; traces and logs then appear in the backend UI within a few seconds.

If you do not see this log line, telemetry did not initialize โ€” check the Troubleshooting section below.

Console mode and logs

Log export to OTLP only activates when the OTLP trace exporter is active. With OTEL_TRACES_EXPORTER=console (debug mode), application logs continue to stream to stdout via the existing logging configuration; they are not duplicated to OTLP.

Log export is opt-in

Application log export to OTLP is disabled by default, even when tracing is configured. Enable it with AIO_OTEL_LOGS_ENABLED=true only for backends intended to retain application logs.

Span attribute visibility is controlled separately through the OpenInference settings below.

Environment Variable Reference

The AI Optimizer honors the standard OpenTelemetry SDK environment variables. The most relevant ones for operators:

Endpoint and protocol

VariableDescriptionDefault
OTEL_EXPORTER_OTLP_ENDPOINTOTLP receiver URL (all signals)(unset = OTLP disabled)
OTEL_EXPORTER_OTLP_TRACES_ENDPOINTTrace-specific endpoint (overrides the generic one)(unset)
OTEL_EXPORTER_OTLP_LOGS_ENDPOINTLog-specific endpoint (overrides the generic one)(unset)
OTEL_EXPORTER_OTLP_PROTOCOLgrpc or http/protobuf (all signals)grpc
OTEL_EXPORTER_OTLP_TRACES_PROTOCOLTrace-specific protocol (overrides the generic one)(unset)
OTEL_EXPORTER_OTLP_LOGS_PROTOCOLLog-specific protocol (overrides the generic one)(unset)
OTEL_EXPORTER_OTLP_INSECUREIf true, skips TLS for gRPC OTLP. Required for plaintext local SigNoz.false
OTEL_EXPORTER_OTLP_HEADERSComma-separated k=v headers (e.g. for vendor auth tokens)(unset)

Exporter selection

VariableDescriptionDefault
OTEL_TRACES_EXPORTERComma-separated list. Supported values: otlp, console, none. Unsupported values are ignored.otlp
AIO_OTEL_LOGS_ENABLEDApplication log export to OTLP โ€” opt-in. See callout below before enabling.false
OTEL_LOGS_EXPORTERSet to none to disable log export when AIO_OTEL_LOGS_ENABLED=true is in effect.otlp

Resource attributes

VariableDescriptionDefault
OTEL_SERVICE_NAMEService name shown in the backendai-optimizer-server
OTEL_RESOURCE_ATTRIBUTESComma-separated k=v attributes attached to every span (e.g. deployment.environment=prd,service.namespace=ai)(unset)

Sampling

VariableDescriptionDefault
OTEL_TRACES_SAMPLERSampler. Common: parentbased_always_on (default), parentbased_traceidratio (probabilistic)parentbased_always_on
OTEL_TRACES_SAMPLER_ARGSampler argument (for ratio sampler: 0.0โ€“1.0, e.g. 0.1 for 10%)(none)

Batch span processor tuning

The OTEL_BSP_* family (OTEL_BSP_MAX_QUEUE_SIZE, OTEL_BSP_SCHEDULE_DELAY, etc.) is honored by the SDK without code changes; tune in production if you observe span drops or back-pressure. See the SDK configuration spec for the full list.

Resource Attributes Set by the Application

The server sets the following attributes by default. All can be overridden via OTEL_RESOURCE_ATTRIBUTES or the dedicated env var.

AttributeSourceExample
service.nameOTEL_SERVICE_NAME env, else built-in defaultai-optimizer-server
service.versionApplication version (from package metadata)2.2.1
deployment.environmentAIO_ENV env (default dev)prd
service.instance.idHOSTNAME env, else a per-process UUIDai-optimizer-server-7c5b9-fzmp2

Operator-supplied values via OTEL_RESOURCE_ATTRIBUTES always take precedence over these defaults.

Deployment Patterns

Bare-metal / VM

Set the variables in .env.dev (or whichever .env.{AIO_ENV} file is loaded), or export them in the shell before running the entrypoint:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
./src/entrypoint.py server

Container (Docker / Podman)

Pass the variables at run time:

docker run --rm \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://signoz-otel-collector:4317 \
  -e OTEL_RESOURCE_ATTRIBUTES=deployment.environment=staging \
  ai-optimizer:latest server

Or via an env-file (--env-file).

If running SigNoz on the same Docker host (e.g. via the SigNoz docker-compose), point at the SigNoz collector container by service name on the shared network.

Kubernetes / Helm

The bundled Helm chart exposes OpenTelemetry settings under server.otel. Two paths:

1. Bring your own OTLP collector โ€” point endpoint at a separately-deployed SigNoz, Jaeger, Tempo, or vendor agent:

# values overlay
server:
  otel:
    enabled: true
    endpoint: http://signoz-otel-collector.observability.svc.cluster.local:4317
    insecure: true   # plaintext gRPC inside the cluster
    resourceAttributes:
      service.namespace: ai-optimizer
    # logsEnabled: true   # opt-in; review backend retention first

2. Install SigNoz alongside the application โ€” flip signoz.enabled=true and the chart deploys SigNoz as a subchart. The server’s OTLP endpoint is then auto-defaulted to the in-cluster collector service URL; you only configure the OTel-side switches:

signoz:
  enabled: true
server:
  otel:
    enabled: true
    insecure: true   # the in-chart collector serves plaintext gRPC

The published image already includes the [otel] extra, so enabled: true works against the default image. Within Kubernetes, service.instance.id is auto-populated from HOSTNAME (the pod name), giving stable per-pod identity in the backend. deployment.environment is set automatically from the chart’s global.env value.

If enabled: true is set without an endpoint (and without tracesExporter: console for local debugging or signoz.enabled=true to use the in-chart collector), helm template / helm install fails fast rather than silently producing zero telemetry.

Troubleshooting

SymptomLikely CauseRemedy
No OTel telemetry initialized log line at startupThe [otel] extra is not installed, OR no exporter is configuredRe-run pip install -e ".[server,otel]"; set OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_TRACES_EXPORTER=console
Log line appears, but no traces in backendgRPC TLS handshake failing against a plaintext collectorSet OTEL_EXPORTER_OTLP_INSECURE=true, or switch to OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf with a http:// URL
OTLP grpc exporter requested but not installed warningThe opentelemetry-exporter-otlp package is missing or partially installedReinstall with the [otel] extra
Operator-set OTEL_RESOURCE_ATTRIBUTES value not visible on spansConfused with a comma-delimited list โ€” use commas, not spaces or semicolonsOTEL_RESOURCE_ATTRIBUTES=k1=v1,k2=v2
OTEL_TRACES_EXPORTER set to a value other than otlp, console, or noneUnsupported values are silently droppedUse a supported value or set both: OTEL_TRACES_EXPORTER=otlp,console
Per-request overhead but no spans recordedOTLP exporter package missing while OTEL_TRACES_EXPORTER=otlp; the SDK now bails before installing instrumentation in this caseReinstall with [otel]; check startup logs for the not installed; skipping warning

Backend-Specific Quickstarts

For step-by-step instructions on standing up a specific OTLP backend and pointing the server at it, see: