Skip to main content

Responses Proxy concepts

Responses Proxy has a simple product boundary: clients send OpenAI Responses-compatible traffic to one Morph endpoint, and the proxy chooses an upstream resource while recording enough telemetry to explain what happened.

For most users, the mental model is:

  1. Send a request to the hosted /v1/responses endpoint.
  2. Include a stable session id when requests belong to the same agent session.
  3. Let the proxy preserve affinity and choose a healthy resource.
  4. Inspect sessions, requests, and routing decisions from the dashboard or admin history APIs.

Operators should also understand the live data plane for routing and the durable analytics plane for telemetry and registry records.

Core concepts

ConceptWhat it means
ResourceA live upstream target created from an OpenAI API key.
Resource poolA named group of resources used to separate traffic classes.
Session affinityA Redis mapping that keeps a logical conversation on the same resource.
Response affinityA Redis mapping from an upstream resp_* id to the resource that produced it.
Telemetry eventA Redis stream event emitted for routing decisions and completed requests.
Durable historySQL records for sessions, requests, token usage, decisions, and raw events.
Provider registryDurable provider metadata exposed through /admin/providers.
Model or agent registryDurable aliases exposed through /admin/models.

Resources

In the current runtime, live routing uses OpenAI keys from environment variables:

OPENAI_API_KEY=sk-...
OPENAI_API_KEYS=sk-1,sk-2
OPENAI_API_KEY_1=sk-...
OPENAI_API_KEY_2=sk-...

Each parsed key becomes a resource such as openai-1 or openai-2. Resources keep live health, in-flight count, total request count, error count, weight, and pool id in Redis.

Resource pools

Pools let callers direct workloads to different key groups. Pool selection can come from:

  • X-Proxy-Pool-ID
  • X-Pool-ID
  • metadata.pool_id

If no pool is requested, all configured resources are eligible.

Affinity

Session affinity keeps follow-up traffic on the same resource. The proxy accepts session ids from headers, metadata fields, and conversation fields. Common choices are:

X-Proxy-Session-ID
metadata.session_id
metadata.conversation_id
conversation.id

Response affinity maps upstream resp_* ids back to the selected resource. The proxy reads response ids from previous_response_id, response route paths, and upstream response bodies.

Load balancing

When no usable affinity applies, the proxy uses weighted least connections. The score is:

in_flight / max(weight, 0.01)

Unhealthy resources are excluded. Degraded resources are skipped while healthy candidates exist, then become eligible again after the configured cooldown.

Telemetry

The API publishes telemetry to Redis. The worker persists it to SQL.

Main event types:

  • lb_decision: selected resource, pool, session, and reason.
  • request_completed: status, latency, payload capture, token usage, and error state.

Token usage is extracted from JSON responses and captured SSE payloads when usage fields are present.

Live state versus durable state

Redis stores live operational state:

  • in-flight counters
  • resource health
  • affinity mappings
  • recent decisions
  • telemetry stream queue

SQL stores durable history:

  • resources
  • sessions
  • requests and responses
  • token usage
  • load-balancer decisions
  • raw telemetry events
  • registered providers
  • registered model or agent aliases

Live Redis values can reset on redeploy. SQL history persists when each deploy uses the same PROXY_DATABASE_URL.

Provider and model registry

ServiceProvider records describe backend services for future routing and dashboard workflows. Registered models and agents are durable aliases that point to providers and describe route hints.

Current limit: registry records are not loaded into the live resource pool yet. Configure OPENAI_API_KEY_* variables for runtime traffic.