Platform Engineering

Multi-Tenant AI Agent Platform

Zero open ports. Per-tenant isolation. Production AI agents.

AWS regions in production

87%

IaC quality score

Open inbound ports

$70/mo

Per-tenant cost

Overview

Designed and deployed a production cloud platform to host OpenClaw, an autonomous AI agent that connects LLMs (Claude, GPT-4) with messaging and productivity channels including Slack, Telegram, WhatsApp, Jira, and GitHub. The platform evolved across two major versions — from a single-tenant EC2 deployment to a Docker-containerized multi-tenant architecture — with a formal IaC security review driving every architectural decision.

The Problem

The team needed to run an autonomous AI agent for multiple isolated tenants simultaneously. Each tenant required its own LLM API credentials, messaging tokens, and tool integrations — completely isolated from other tenants. The naive approach of open SSH ports and config files on EC2 was not viable for production: it exposed credentials, created an unacceptable attack surface, and made onboarding and secret rotation manual and error-prone at scale.

The Solution

v1 established the foundation: Pulumi TypeScript IaC with 6 components (networking, secrets, IAM, storage, compute, observability). SSH restricted by a required sshCidr parameter — 0.0.0.0/0 rejected at deploy time. API keys flow exclusively through Secrets Manager → IAM Instance Profile → agent config. EBS and EFS encrypted at rest. Kill switch via CloudWatch CPU alarm auto-stopping the instance. A formal IaC security review identified 5 blockers, all resolved before production. v2 introduced Docker containerization and true multi-tenancy: a 7th component (MossAccess) added per-tenant IAM groups with least-privilege. SSM port-forwarding replaced all open SSH — zero open inbound ports. provision.py fully automates tenant onboarding. moss-harden.sh runs non-destructive security audits on live instances. 9 documented security controls (P1–P9) enforced across all tenants.

The Results

Two tenants deployed to production across two AWS regions (us-east-1, us-west-2). Zero open inbound ports — all access via SSM port-forwarding. 87% IaC quality score against security and cloud criteria. Operational cost of approximately $68–72 per tenant per month. Secret rotation and SSH key updates without infrastructure redeployment. Spec-driven development process with 10 PRs merged and a full security control inventory documented.

Key Takeaways

→SSM port-forwarding eliminates open SSH attack surface without sacrificing operational access — zero open ports is achievable in production without workflow pain
→Per-tenant Secrets Manager paths with least-privilege IAM groups provide strong isolation; credential rotation never requires a redeploy
→Formal IaC security review before the first tenant goes live is worth the investment — the v1 audit caught 5 production-blocking findings
→Docker containerization of the agent layer made multi-tenancy far simpler than per-process isolation on bare EC2; first-boot build time is the tradeoff

Tools & Technologies

Pulumi TypeScriptAWS EC2DockerAWS EFSAWS SSMSecrets ManagerCloudWatchSNSIAMPython

Let's work together

Back to Portfolio All Case Studies