Skip to main content

Responses Proxy architecture

Responses Proxy runs a FastAPI app, a Redis-backed live state store, a telemetry worker, and SQL persistence. In the Docker service container, Redis, the worker, and the API process run together. In local development, they can run as separate processes.

Runtime components

ComponentResponsibility
FastAPI appCreates routes, initializes SQL, wires proxy, health, dashboard, admin, and registry handlers.
Proxy routesForward /v1/responses traffic, choose resources, rewrite upstream auth, and emit telemetry.
Load balancerApplies response affinity, session affinity, and weighted least-connections selection.
Health checkerCalls the upstream health path for each resource and updates Redis health state.
Redis state storeTracks resources, counters, affinity mappings, recent decisions, and telemetry queue depth.
Telemetry workerConsumes Redis stream events and writes durable SQL records.
SQL databaseStores historical sessions, requests, token usage, decisions, events, and registry records.
DashboardServes a single HTML page backed by admin state and durable history endpoints.

Request flow

For a normal POST /v1/responses request:

  1. The proxy reads the body and parses JSON when possible.
  2. It extracts session id candidates, session name, pool id, and previous response id.
  3. Response affinity is checked first when a resp_* id is available.
  4. Session affinity is checked next.
  5. If affinity does not apply, the load balancer selects by weighted least connections.
  6. The selected OpenAI resource key replaces the incoming Authorization header.
  7. The request is forwarded to PROXY_UPSTREAM_BASE_URL.
  8. Resource health is updated from the upstream status code.
  9. Response affinity is stored when the upstream body contains a response id.
  10. Completion telemetry is published and then persisted by the worker.

Streaming and WebSocket traffic

Streaming is detected when the request body contains:

{"stream": true}

The proxy streams upstream bytes to the client as they arrive, captures data up to PROXY_MAX_CAPTURE_BYTES, and emits telemetry when the stream closes.

The WebSocket route is:

WS /v1/responses

The proxy selects a resource using header-based session and pool hints, opens an upstream WebSocket, bridges messages in both directions, and records completion telemetry when either side closes or errors.

Health states

The health checker calls:

{PROXY_UPSTREAM_BASE_URL}/{PROXY_HEALTH_CHECK_PATH}

Default:

https://api.openai.com/v1/models

Status classification:

  • 2xx: healthy
  • 401 or 403: unhealthy
  • 429 or >=500: degraded
  • anything else: unhealthy

Trigger a manual check:

curl -X POST https://agent-responses-proxy.svc.cloud.morph.so/admin/health/check

Persistence boundary

Use Redis for live decisions and SQL for durable analysis. Redis counters, affinity mappings, queue depth, and live health can reset on redeploy. SQL-backed sessions, requests, decisions, token usage, and registry records persist when PROXY_DATABASE_URL points to the same database.

Routing boundary

Runtime routing currently uses only environment-configured OpenAI resources. /admin/providers and /admin/models are the durable management plane for future provider, model, agent, and compound-agent routing.