Platform Engineering
Multi-Tenant AI Agent Platform
Zero open ports. Per-tenant isolation. Production AI agents.
Overview
Designed and deployed a production cloud platform to host OpenClaw, an autonomous AI agent that connects LLMs (Claude, GPT-4) with messaging and productivity channels including Slack, Telegram, WhatsApp, Jira, and GitHub. The platform evolved across two major versions β from a single-tenant EC2 deployment to a Docker-containerized multi-tenant architecture β with a formal IaC security review driving every architectural decision.
The Problem
The team needed to run an autonomous AI agent for multiple isolated tenants simultaneously. Each tenant required its own LLM API credentials, messaging tokens, and tool integrations β completely isolated from other tenants. The naive approach of open SSH ports and config files on EC2 was not viable for production: it exposed credentials, created an unacceptable attack surface, and made onboarding and secret rotation manual and error-prone at scale.
The Solution
v1 established the foundation: Pulumi TypeScript IaC with 6 components (networking, secrets, IAM, storage, compute, observability). SSH restricted by a required sshCidr parameter β 0.0.0.0/0 rejected at deploy time. API keys flow exclusively through Secrets Manager β IAM Instance Profile β agent config. EBS and EFS encrypted at rest. Kill switch via CloudWatch CPU alarm auto-stopping the instance. A formal IaC security review identified 5 blockers, all resolved before production. v2 introduced Docker containerization and true multi-tenancy: a 7th component (MossAccess) added per-tenant IAM groups with least-privilege. SSM port-forwarding replaced all open SSH β zero open inbound ports. provision.py fully automates tenant onboarding. moss-harden.sh runs non-destructive security audits on live instances. 9 documented security controls (P1βP9) enforced across all tenants.
The Results
Two tenants deployed to production across two AWS regions (us-east-1, us-west-2). Zero open inbound ports β all access via SSM port-forwarding. 87% IaC quality score against security and cloud criteria. Operational cost of approximately $68β72 per tenant per month. Secret rotation and SSH key updates without infrastructure redeployment. Spec-driven development process with 10 PRs merged and a full security control inventory documented.
Key Takeaways
- βSSM port-forwarding eliminates open SSH attack surface without sacrificing operational access β zero open ports is achievable in production without workflow pain
- βPer-tenant Secrets Manager paths with least-privilege IAM groups provide strong isolation; credential rotation never requires a redeploy
- βFormal IaC security review before the first tenant goes live is worth the investment β the v1 audit caught 5 production-blocking findings
- βDocker containerization of the agent layer made multi-tenancy far simpler than per-process isolation on bare EC2; first-boot build time is the tradeoff
Tools & Technologies
Let's work together