Skip to main content

Auto-Scale-to-Zero Agent Workspaces

This guide shows how to combine TTL (time-to-live) with wake-on-request to create agent workspaces that automatically pause ("scale to zero") after a period of inactivity and instantly rehydrate on the next HTTP request or SSH attempt.

Overview

Auto-scaling agent workspaces behave like "serverless VMs":

  • Zero compute charges while idle
  • Instant resume on demand with preserved memory state
  • Automatic pause after configurable inactivity period
  • Seamless rehydration for users

Configuration Components

1. TTL (Time-to-Live)

Configure how long an instance runs before taking action:

  • ttl_seconds: Duration before the instance expires
  • ttl_action: What the instance does when it expires, one of:
    • "pause" - Pauses the instance with full memory state, allowing for instant resume.
    • "stop" - Deletes the instance.

TTL can be set when starting an instance or updated on a running instance.

2. Wake-on-Request

Enable automatic resume when a paused instance receives:

  • HTTP requests to an exposed service
  • SSH connection attempts

Both wake_on_http and wake_on_ssh can be configured independently.

tip

For wake-on-HTTP to work, you must expose a service from inside the instance (e.g., your agent's web server). The platform provides a public URL when you expose a service.

Implementation Recipe

Step 1: Start with TTL

Start your agent workspace with a TTL window and pause action:

from morphcloud.api import MorphCloudClient
import os

client = MorphCloudClient(api_key=os.getenv("MORPH_API_KEY"))

# Start instance with 15-minute TTL that pauses on expiry
instance = client.instances.start(
snapshot_id="snapshot_your_snapshot_id",
ttl_seconds=15 * 60, # 15 minutes
ttl_action="pause"
)

print(f"Instance started: {instance.id}")

Step 2: Expose HTTP Service

Expose your agent's HTTP service to get a public URL:

# Expose agent service running on port 8080
service = instance.expose_http_service("agent", 8080)
print(f"Agent URL: {service.url}")

Step 3: Enable Wake-on-Request

Configure automatic wake for both HTTP and SSH:

# Enable wake-on for both HTTP and SSH
instance.set_wake_on(wake_on_http=True, wake_on_ssh=True)

Step 4: Implement Sliding TTL (Optional)

Keep the instance warm during active use by refreshing TTL on each interaction:

def touch_ttl(instance, seconds=15 * 60):
"""
Refresh TTL after handling user requests to keep session warm.
Instance only pauses after a period of inactivity.
"""
instance.set_ttl(ttl_seconds=seconds, ttl_action="pause")

# Example usage in your request handler:
def handle_request(request):
# Process the request...
result = process_agent_request(request)

# Refresh TTL to keep instance warm
touch_ttl(instance)

return result

Complete Example

Here's a full implementation of an auto-scaling agent workspace:

import os
from morphcloud.api import MorphCloudClient

class AutoScalingAgent:
def __init__(self, snapshot_id: str, ttl_minutes: int = 15):
self.client = MorphCloudClient(api_key=os.getenv("MORPH_API_KEY"))
self.snapshot_id = snapshot_id
self.ttl_seconds = ttl_minutes * 60
self.instance = None

def start(self):
"""Start the agent workspace with auto-scaling configuration."""
# 1. Start instance with TTL
self.instance = self.client.instances.start(
snapshot_id=self.snapshot_id,
ttl_seconds=self.ttl_seconds,
ttl_action="pause"
)
print(f"Instance started: {self.instance.id}")

# 2. Expose HTTP service (agent on port 8080)
service = self.instance.expose_http_service("agent", 8080)
print(f"Agent URL: {service.url}")

# 3. Enable wake-on-request
self.instance.set_wake_on(
wake_on_http=True,
wake_on_ssh=True
)
print("Wake-on-request enabled for HTTP and SSH")

return service.url

def refresh_ttl(self):
"""Refresh TTL to keep instance warm during activity."""
if self.instance:
self.instance.set_ttl(
ttl_seconds=self.ttl_seconds,
ttl_action="pause"
)

# Usage
agent = AutoScalingAgent("snapshot_your_snapshot_id", ttl_minutes=15)
agent_url = agent.start()

# In your request handler, refresh TTL on each interaction:
# agent.refresh_ttl()

Best Practices

Expose Services for Wake-on-HTTP

Wake-on-HTTP requires an exposed service. If no service is exposed, the platform has nothing to route requests to and cannot trigger a wake.

Implement Sliding TTL for Activity-Based Scaling

Call the TTL update endpoint after each successful interaction to effectively pause the VM only after a period of inactivity rather than a fixed time window.

Monitor Instance State

Keep track of your instance state to handle edge cases gracefully:

# Check instance state before operations
instance = client.instances.get(instance_id)
if instance.status == "paused":
print("Instance is paused, will wake on next request")
elif instance.status == "running":
print("Instance is active")

API Reference

Update TTL

  • Endpoint: POST /instance/:instance_id/ttl
  • Purpose: Update the TTL of a running instance

Configure Wake-on-Request

  • Endpoint: POST /instance/:instance_id/wake-on
  • Purpose: Enable wake on HTTP/SSH for a paused instance

Expose HTTP Service

  • Endpoint: POST /instance/:instance_id/http
  • Purpose: Create a public URL for your agent service

Troubleshooting

Instance Not Waking on HTTP Request

  • Verify that you've exposed an HTTP service using expose_http_service()
  • Check that wake_on_http is enabled
  • Ensure the service inside the instance is running on the exposed port

Instance Pausing Too Quickly

  • Increase the ttl_seconds value
  • Implement sliding TTL to refresh on each user interaction

High Latency on First Request After Pause

  • This is normal for the first request that triggers a wake
  • Consider keeping instances warm during peak hours with longer TTL values

With this setup, your agent workspaces behave like serverless infrastructure: zero compute costs while idle, with instant resume on demand.