Responses Proxy architecture
Responses Proxy runs a FastAPI app, a Redis-backed live state store, a telemetry worker, and SQL persistence. In the Docker service container, Redis, the worker, and the API process run together. In local development, they can run as separate processes.
Runtime components
| Component | Responsibility |
|---|---|
| FastAPI app | Creates routes, initializes SQL, wires proxy, health, dashboard, admin, and registry handlers. |
| Proxy routes | Forward /v1/responses traffic, choose resources, rewrite upstream auth, and emit telemetry. |
| Load balancer | Applies response affinity, session affinity, and weighted least-connections selection. |
| Health checker | Calls the upstream health path for each resource and updates Redis health state. |
| Redis state store | Tracks resources, counters, affinity mappings, recent decisions, and telemetry queue depth. |
| Telemetry worker | Consumes Redis stream events and writes durable SQL records. |
| SQL database | Stores historical sessions, requests, token usage, decisions, events, and registry records. |
| Dashboard | Serves a single HTML page backed by admin state and durable history endpoints. |
Request flow
For a normal POST /v1/responses request:
- The proxy reads the body and parses JSON when possible.
- It extracts session id candidates, session name, pool id, and previous response id.
- Response affinity is checked first when a
resp_*id is available. - Session affinity is checked next.
- If affinity does not apply, the load balancer selects by weighted least connections.
- The selected OpenAI resource key replaces the incoming
Authorizationheader. - The request is forwarded to
PROXY_UPSTREAM_BASE_URL. - Resource health is updated from the upstream status code.
- Response affinity is stored when the upstream body contains a response id.
- Completion telemetry is published and then persisted by the worker.
Streaming and WebSocket traffic
Streaming is detected when the request body contains:
{"stream": true}
The proxy streams upstream bytes to the client as they arrive, captures data up to PROXY_MAX_CAPTURE_BYTES, and emits telemetry when the stream closes.
The WebSocket route is:
WS /v1/responses
The proxy selects a resource using header-based session and pool hints, opens an upstream WebSocket, bridges messages in both directions, and records completion telemetry when either side closes or errors.
Health states
The health checker calls:
{PROXY_UPSTREAM_BASE_URL}/{PROXY_HEALTH_CHECK_PATH}
Default:
https://api.openai.com/v1/models
Status classification:
2xx: healthy401or403: unhealthy429or>=500: degraded- anything else: unhealthy
Trigger a manual check:
curl -X POST https://agent-responses-proxy.svc.cloud.morph.so/admin/health/check
Persistence boundary
Use Redis for live decisions and SQL for durable analysis. Redis counters, affinity mappings, queue depth, and live health can reset on redeploy. SQL-backed sessions, requests, decisions, token usage, and registry records persist when PROXY_DATABASE_URL points to the same database.
Routing boundary
Runtime routing currently uses only environment-configured OpenAI resources. /admin/providers and /admin/models are the durable management plane for future provider, model, agent, and compound-agent routing.