Auto-Scale-to-Zero Agent Workspaces
This guide shows how to combine TTL (time-to-live) with wake-on-request to create agent workspaces that automatically pause ("scale to zero") after a period of inactivity and instantly rehydrate on the next HTTP request or SSH attempt.
Overview
Auto-scaling agent workspaces behave like "serverless VMs":
- Zero compute charges while idle
- Instant resume on demand with preserved memory state
- Automatic pause after configurable inactivity period
- Seamless rehydration for users
Configuration Components
1. TTL (Time-to-Live)
Configure how long an instance runs before taking action:
ttl_seconds
: Duration before the instance expiresttl_action
: What the instance does when it expires, one of:"pause"
- Pauses the instance with full memory state, allowing for instant resume."stop"
- Deletes the instance.
TTL can be set when starting an instance or updated on a running instance.
2. Wake-on-Request
Enable automatic resume when a paused instance receives:
- HTTP requests to an exposed service
- SSH connection attempts
Both wake_on_http
and wake_on_ssh
can be configured independently.
For wake-on-HTTP to work, you must expose a service from inside the instance (e.g., your agent's web server). The platform provides a public URL when you expose a service.
Implementation Recipe
Step 1: Start with TTL
Start your agent workspace with a TTL window and pause action:
- Python
- TypeScript
from morphcloud.api import MorphCloudClient
import os
client = MorphCloudClient(api_key=os.getenv("MORPH_API_KEY"))
# Start instance with 15-minute TTL that pauses on expiry
instance = client.instances.start(
snapshot_id="snapshot_your_snapshot_id",
ttl_seconds=15 * 60, # 15 minutes
ttl_action="pause"
)
print(f"Instance started: {instance.id}")
import { MorphCloudClient } from 'morphcloud';
const client = new MorphCloudClient({
apiKey: process.env.MORPH_API_KEY
});
async function startAgentWorkspace() {
// Start instance with 15-minute TTL that pauses on expiry
const instance = await client.instances.start({
snapshotId: "snapshot_your_snapshot_id",
ttlSeconds: 15 * 60, // 15 minutes
ttlAction: "pause"
});
console.log(`Instance started: ${instance.id}`);
return instance;
}
Step 2: Expose HTTP Service
Expose your agent's HTTP service to get a public URL:
- Python
- TypeScript
# Expose agent service running on port 8080
service = instance.expose_http_service("agent", 8080)
print(f"Agent URL: {service.url}")
// Expose agent service running on port 8080
const service = await instance.exposeHttpService("agent", 8080);
console.log(`Agent URL: ${service.url}`);
Step 3: Enable Wake-on-Request
Configure automatic wake for both HTTP and SSH:
- Python
- TypeScript
# Enable wake-on for both HTTP and SSH
instance.set_wake_on(wake_on_http=True, wake_on_ssh=True)
// Enable wake-on for both HTTP and SSH
await instance.setWakeOn({
wakeOnHttp: true,
wakeOnSsh: true
});
Step 4: Implement Sliding TTL (Optional)
Keep the instance warm during active use by refreshing TTL on each interaction:
- Python
- TypeScript
def touch_ttl(instance, seconds=15 * 60):
"""
Refresh TTL after handling user requests to keep session warm.
Instance only pauses after a period of inactivity.
"""
instance.set_ttl(ttl_seconds=seconds, ttl_action="pause")
# Example usage in your request handler:
def handle_request(request):
# Process the request...
result = process_agent_request(request)
# Refresh TTL to keep instance warm
touch_ttl(instance)
return result
async function touchTTL(instance: Instance, seconds = 15 * 60) {
/**
* Refresh TTL after handling user requests to keep session warm.
* Instance only pauses after a period of inactivity.
*/
await instance.setTtl({
ttlSeconds: seconds,
ttlAction: "pause"
});
}
// Example usage in your request handler:
async function handleRequest(request: Request) {
// Process the request...
const result = await processAgentRequest(request);
// Refresh TTL to keep instance warm
await touchTTL(instance);
return result;
}
Complete Example
Here's a full implementation of an auto-scaling agent workspace:
- Python
- TypeScript
import os
from morphcloud.api import MorphCloudClient
class AutoScalingAgent:
def __init__(self, snapshot_id: str, ttl_minutes: int = 15):
self.client = MorphCloudClient(api_key=os.getenv("MORPH_API_KEY"))
self.snapshot_id = snapshot_id
self.ttl_seconds = ttl_minutes * 60
self.instance = None
def start(self):
"""Start the agent workspace with auto-scaling configuration."""
# 1. Start instance with TTL
self.instance = self.client.instances.start(
snapshot_id=self.snapshot_id,
ttl_seconds=self.ttl_seconds,
ttl_action="pause"
)
print(f"Instance started: {self.instance.id}")
# 2. Expose HTTP service (agent on port 8080)
service = self.instance.expose_http_service("agent", 8080)
print(f"Agent URL: {service.url}")
# 3. Enable wake-on-request
self.instance.set_wake_on(
wake_on_http=True,
wake_on_ssh=True
)
print("Wake-on-request enabled for HTTP and SSH")
return service.url
def refresh_ttl(self):
"""Refresh TTL to keep instance warm during activity."""
if self.instance:
self.instance.set_ttl(
ttl_seconds=self.ttl_seconds,
ttl_action="pause"
)
# Usage
agent = AutoScalingAgent("snapshot_your_snapshot_id", ttl_minutes=15)
agent_url = agent.start()
# In your request handler, refresh TTL on each interaction:
# agent.refresh_ttl()
import { MorphCloudClient, Instance } from 'morphcloud';
class AutoScalingAgent {
private client: MorphCloudClient;
private snapshotId: string;
private ttlSeconds: number;
private instance?: Instance;
constructor(snapshotId: string, ttlMinutes: number = 15) {
this.client = new MorphCloudClient({
apiKey: process.env.MORPH_API_KEY
});
this.snapshotId = snapshotId;
this.ttlSeconds = ttlMinutes * 60;
}
async start(): Promise<string> {
// 1. Start instance with TTL
this.instance = await this.client.instances.start({
snapshotId: this.snapshotId,
ttlSeconds: this.ttlSeconds,
ttlAction: "pause"
});
console.log(`Instance started: ${this.instance.id}`);
// 2. Expose HTTP service (agent on port 8080)
const service = await this.instance.exposeHttpService("agent", 8080);
console.log(`Agent URL: ${service.url}`);
// 3. Enable wake-on-request
await this.instance.setWakeOn({
wakeOnHttp: true,
wakeOnSsh: true
});
console.log("Wake-on-request enabled for HTTP and SSH");
return service.url;
}
async refreshTTL(): Promise<void> {
if (this.instance) {
await this.instance.setTtl({
ttlSeconds: this.ttlSeconds,
ttlAction: "pause"
});
}
}
}
// Usage
const agent = new AutoScalingAgent("snapshot_your_snapshot_id", 15);
const agentUrl = await agent.start();
// In your request handler, refresh TTL on each interaction:
// await agent.refreshTTL();
Best Practices
Expose Services for Wake-on-HTTP
Wake-on-HTTP requires an exposed service. If no service is exposed, the platform has nothing to route requests to and cannot trigger a wake.
Implement Sliding TTL for Activity-Based Scaling
Call the TTL update endpoint after each successful interaction to effectively pause the VM only after a period of inactivity rather than a fixed time window.
Monitor Instance State
Keep track of your instance state to handle edge cases gracefully:
# Check instance state before operations
instance = client.instances.get(instance_id)
if instance.status == "paused":
print("Instance is paused, will wake on next request")
elif instance.status == "running":
print("Instance is active")
API Reference
Update TTL
- Endpoint:
POST /instance/:instance_id/ttl
- Purpose: Update the TTL of a running instance
Configure Wake-on-Request
- Endpoint:
POST /instance/:instance_id/wake-on
- Purpose: Enable wake on HTTP/SSH for a paused instance
Expose HTTP Service
- Endpoint:
POST /instance/:instance_id/http
- Purpose: Create a public URL for your agent service
Troubleshooting
Instance Not Waking on HTTP Request
- Verify that you've exposed an HTTP service using
expose_http_service()
- Check that
wake_on_http
is enabled - Ensure the service inside the instance is running on the exposed port
Instance Pausing Too Quickly
- Increase the
ttl_seconds
value - Implement sliding TTL to refresh on each user interaction
High Latency on First Request After Pause
- This is normal for the first request that triggers a wake
- Consider keeping instances warm during peak hours with longer TTL values
With this setup, your agent workspaces behave like serverless infrastructure: zero compute costs while idle, with instant resume on demand.