Auto-Scale-to-Zero Agent Workspaces

This guide shows how to combine TTL (time-to-live) with wake-on-request to create agent workspaces that automatically pause ("scale to zero") after a period of inactivity and instantly rehydrate on the next HTTP request or SSH attempt.

Overview

Auto-scaling agent workspaces behave like "serverless VMs":

Zero compute charges while idle
Instant resume on demand with preserved memory state
Automatic pause after configurable inactivity period
Seamless rehydration for users

Configuration Components

1. TTL (Time-to-Live)

Configure how long an instance runs before taking action:

ttl_seconds: Duration before the instance expires
ttl_action: What the instance does when it expires, one of:
- "pause" - Pauses the instance with full memory state, allowing for instant resume.
- "stop" - Deletes the instance.

TTL can be set when starting an instance or updated on a running instance.

2. Wake-on-Request

Enable automatic resume when a paused instance receives:

HTTP requests to an exposed service
SSH connection attempts

Both wake_on_http and wake_on_ssh can be configured independently.

tip

For wake-on-HTTP to work, you must expose a service from inside the instance (e.g., your agent's web server). The platform provides a public URL when you expose a service.

Implementation Recipe

Step 1: Start with TTL

Start your agent workspace with a TTL window and pause action:

Python
TypeScript

from morphcloud.api import MorphCloudClient
import os

client = MorphCloudClient(api_key=os.getenv("MORPH_API_KEY"))

# Start instance with 15-minute TTL that pauses on expiry
instance = client.instances.start(
    snapshot_id="snapshot_your_snapshot_id",
    ttl_seconds=15 * 60,  # 15 minutes
    ttl_action="pause"
)

print(f"Instance started: {instance.id}")

import { MorphCloudClient } from 'morphcloud';

const client = new MorphCloudClient({
  apiKey: process.env.MORPH_API_KEY
});

async function startAgentWorkspace() {
  // Start instance with 15-minute TTL that pauses on expiry
  const instance = await client.instances.start({
    snapshotId: "snapshot_your_snapshot_id",
    ttlSeconds: 15 * 60,  // 15 minutes
    ttlAction: "pause"
  });

  console.log(`Instance started: ${instance.id}`);
  return instance;
}

Step 2: Expose HTTP Service

Expose your agent's HTTP service to get a public URL:

Python
TypeScript

# Expose agent service running on port 8080
service = instance.expose_http_service("agent", 8080)
print(f"Agent URL: {service.url}")

// Expose agent service running on port 8080
const service = await instance.exposeHttpService("agent", 8080);
console.log(`Agent URL: ${service.url}`);

Step 3: Enable Wake-on-Request

Configure automatic wake for both HTTP and SSH:

Python
TypeScript

# Enable wake-on for both HTTP and SSH
instance.set_wake_on(wake_on_http=True, wake_on_ssh=True)

// Enable wake-on for both HTTP and SSH
await instance.setWakeOn({
  wakeOnHttp: true,
  wakeOnSsh: true
});

Step 4: Implement Sliding TTL (Optional)

Keep the instance warm during active use by refreshing TTL on each interaction:

Python
TypeScript

def touch_ttl(instance, seconds=15 * 60):
    """
    Refresh TTL after handling user requests to keep session warm.
    Instance only pauses after a period of inactivity.
    """
    instance.set_ttl(ttl_seconds=seconds, ttl_action="pause")

# Example usage in your request handler:
def handle_request(request):
    # Process the request...
    result = process_agent_request(request)

    # Refresh TTL to keep instance warm
    touch_ttl(instance)

    return result

async function touchTTL(instance: Instance, seconds = 15 * 60) {
  /**
   * Refresh TTL after handling user requests to keep session warm.
   * Instance only pauses after a period of inactivity.
   */
  await instance.setTtl({
    ttlSeconds: seconds,
    ttlAction: "pause"
  });
}

// Example usage in your request handler:
async function handleRequest(request: Request) {
  // Process the request...
  const result = await processAgentRequest(request);

  // Refresh TTL to keep instance warm
  await touchTTL(instance);

  return result;
}

Complete Example

Here's a full implementation of an auto-scaling agent workspace:

Python
TypeScript

import os
from morphcloud.api import MorphCloudClient

class AutoScalingAgent:
    def __init__(self, snapshot_id: str, ttl_minutes: int = 15):
        self.client = MorphCloudClient(api_key=os.getenv("MORPH_API_KEY"))
        self.snapshot_id = snapshot_id
        self.ttl_seconds = ttl_minutes * 60
        self.instance = None

    def start(self):
        """Start the agent workspace with auto-scaling configuration."""
        # 1. Start instance with TTL
        self.instance = self.client.instances.start(
            snapshot_id=self.snapshot_id,
            ttl_seconds=self.ttl_seconds,
            ttl_action="pause"
        )
        print(f"Instance started: {self.instance.id}")

        # 2. Expose HTTP service (agent on port 8080)
        service = self.instance.expose_http_service("agent", 8080)
        print(f"Agent URL: {service.url}")

        # 3. Enable wake-on-request
        self.instance.set_wake_on(
            wake_on_http=True,
            wake_on_ssh=True
        )
        print("Wake-on-request enabled for HTTP and SSH")

        return service.url

    def refresh_ttl(self):
        """Refresh TTL to keep instance warm during activity."""
        if self.instance:
            self.instance.set_ttl(
                ttl_seconds=self.ttl_seconds,
                ttl_action="pause"
            )

# Usage
agent = AutoScalingAgent("snapshot_your_snapshot_id", ttl_minutes=15)
agent_url = agent.start()

# In your request handler, refresh TTL on each interaction:
# agent.refresh_ttl()

import { MorphCloudClient, Instance } from 'morphcloud';

class AutoScalingAgent {
  private client: MorphCloudClient;
  private snapshotId: string;
  private ttlSeconds: number;
  private instance?: Instance;

  constructor(snapshotId: string, ttlMinutes: number = 15) {
    this.client = new MorphCloudClient({
      apiKey: process.env.MORPH_API_KEY
    });
    this.snapshotId = snapshotId;
    this.ttlSeconds = ttlMinutes * 60;
  }

  async start(): Promise<string> {
    // 1. Start instance with TTL
    this.instance = await this.client.instances.start({
      snapshotId: this.snapshotId,
      ttlSeconds: this.ttlSeconds,
      ttlAction: "pause"
    });
    console.log(`Instance started: ${this.instance.id}`);

    // 2. Expose HTTP service (agent on port 8080)
    const service = await this.instance.exposeHttpService("agent", 8080);
    console.log(`Agent URL: ${service.url}`);

    // 3. Enable wake-on-request
    await this.instance.setWakeOn({
      wakeOnHttp: true,
      wakeOnSsh: true
    });
    console.log("Wake-on-request enabled for HTTP and SSH");

    return service.url;
  }

  async refreshTTL(): Promise<void> {
    if (this.instance) {
      await this.instance.setTtl({
        ttlSeconds: this.ttlSeconds,
        ttlAction: "pause"
      });
    }
  }
}

// Usage
const agent = new AutoScalingAgent("snapshot_your_snapshot_id", 15);
const agentUrl = await agent.start();

// In your request handler, refresh TTL on each interaction:
// await agent.refreshTTL();

Best Practices

Expose Services for Wake-on-HTTP

Wake-on-HTTP requires an exposed service. If no service is exposed, the platform has nothing to route requests to and cannot trigger a wake.

Implement Sliding TTL for Activity-Based Scaling

Call the TTL update endpoint after each successful interaction to effectively pause the VM only after a period of inactivity rather than a fixed time window.

Monitor Instance State

Keep track of your instance state to handle edge cases gracefully:

# Check instance state before operations
instance = client.instances.get(instance_id)
if instance.status == "paused":
    print("Instance is paused, will wake on next request")
elif instance.status == "running":
    print("Instance is active")

API Reference

Update TTL

Endpoint: POST /instance/:instance_id/ttl
Purpose: Update the TTL of a running instance

Configure Wake-on-Request

Endpoint: POST /instance/:instance_id/wake-on
Purpose: Enable wake on HTTP/SSH for a paused instance

Expose HTTP Service

Endpoint: POST /instance/:instance_id/http
Purpose: Create a public URL for your agent service

Troubleshooting

Instance Not Waking on HTTP Request

Verify that you've exposed an HTTP service using expose_http_service()
Check that wake_on_http is enabled
Ensure the service inside the instance is running on the exposed port

Instance Pausing Too Quickly

Increase the ttl_seconds value
Implement sliding TTL to refresh on each user interaction

High Latency on First Request After Pause

This is normal for the first request that triggers a wake
Consider keeping instances warm during peak hours with longer TTL values

With this setup, your agent workspaces behave like serverless infrastructure: zero compute costs while idle, with instant resume on demand.

Overview​

Configuration Components​

1. TTL (Time-to-Live)​

2. Wake-on-Request​

Implementation Recipe​

Step 1: Start with TTL​

Step 2: Expose HTTP Service​

Step 3: Enable Wake-on-Request​

Step 4: Implement Sliding TTL (Optional)​

Complete Example​

Best Practices​

Expose Services for Wake-on-HTTP​

Implement Sliding TTL for Activity-Based Scaling​

Monitor Instance State​

API Reference​

Update TTL​

Configure Wake-on-Request​

Expose HTTP Service​

Troubleshooting​

Instance Not Waking on HTTP Request​

Instance Pausing Too Quickly​

High Latency on First Request After Pause​