One designer. Nine AI agents.
A 24/7 autonomous office.
A fully autonomous multi-agent AI system: 9 specialized agents running 24/7 on real infrastructure. One paid coordinator brain, eight free local workers, connected across cloud and GPU hardware via VPN.
I applied the same principles I use to design product teams to orchestrate AI agents: clear roles, communication protocols, information hierarchy, and a single source of truth.
Built from scratch. Running in production. Not a demo.
Chapter 1
The next frontier of AI isn't better chat. It's autonomous execution with memory, structure, and defined roles. The Frodo Project is my exploration into how multiple agents can form a coherent system with strategic oversight, clear delegation, and measurable outcomes, a digital team rather than a collection of isolated assistants.
The goal was not another chatbot. It was an intelligent operational layer with strategic awareness, persistent memory, and real autonomy. A team, not a tool.
Chapter 2
Building a multi-agent system isn't a prompt engineering exercise. It's a systems design problem. The same kind of problem I solve when I'm designing product teams, information architectures, or interaction flows, except the "team members" are AI models with very specific capabilities and limitations.
Chapter 3
Building an autonomous multi-agent system meant solving problems no documentation covered. Each milestone came with a failure that reshaped the architecture.
Started with a VPS (cloud server) and connected it to my home network using Tailscale VPN. Installed OpenClaw as the agent framework. Got the first agent, Frodo, running and responding to commands. The foundation was a Linux server that could talk to the outside world and run AI models.
Designed 9 specialized agents, each with a distinct identity, role, and skill set. Scout hunts jobs. Quill writes cover letters. Echo handles outreach. Forge builds code. Pixel designs. Every agent got an IDENTITY.md file defining their personality, responsibilities, and boundaries, the same way you'd write a design system, but for AI behavior.
Independent scheduled jobs with no shared memory meant hundreds of API requests per day, millions of tokens burned, and costs spiraling from processes running without oversight. The cloud bill doubled overnight.
Each session started from scratch. Agents re-researched completed work, repeated tasks, asked the same questions. No persistent context, no deduplication. Every morning was their first day on the job.
A social media agent was running every 30 minutes with no coordinator oversight. The coordinator would say "waiting for reports" while background crons fired independently, burning through API credits without its knowledge.
Root cause: cron jobs hiding in two layers, the app scheduler and the system crontab. Agents bypassing the coordinator entirely. No oversight, no deduplication. The equivalent of half your team working nights on projects nobody asked for.
Deleted every independent cron. Established one rule: Frodo controls everything. Sub-agents never self-trigger. Only the coordinator reads state, decides what needs to happen, spawns the right agent, reviews the output, and updates state.
This is the same principle that makes great product teams work. One decision-maker. Clear delegation. Every action traceable back to a decision.
Set up "DevMan", an AMD Ryzen 9 5900HX with RTX 3080 (16GB VRAM), as a dedicated local inference server. Ollama for model serving, connected to the VPS via Tailscale VPN, configured as a model provider in OpenClaw.
Result: worker agents run on free local models, only the coordinator uses the paid cloud API. Daily operating cost dropped from dollars to cents.
Tested every model I could find. phi4:14b had no tool support at all, returns "400 does not support tools." gpt-oss:20b worked on the first tool call, then degrades, starts hallucinating function names like "container.exec" that don't exist. Kimi K2.5 was bad at following multi-step commands.
The workaround: design tasks as single-shot shell scripts. If the model only needs to succeed at one tool call instead of five sequential ones, reliability goes way up. Design the system around the model's limitations, not against them.
GPU memory conflicts between image generation and LLM serving caused Ollama to get evicted from VRAM. VPN connections went cold after inactivity. In both cases, the system silently fell back to the paid cloud API with no visible error. Everything looked fine on the surface while costs climbed invisibly.
The fix was layered: connection keepalives every 2 minutes, model usage monitoring (not just configuration), and alerts for unexpected fallbacks. The recurring lesson: design for silent failures from the start.
Built a real-time Next.js monitoring dashboard: agent status, cron health, activity feeds, model routing, and cost tracking. The recurring lesson from every roadblock was the same: you can't manage what you can't see. So I made everything visible.
Chapter 4
The final architecture follows a pattern I've used in every product team I've designed for: one clear leader, specialized roles, defined communication channels, and a single source of truth.
The design principle: One brain pays for quality. Eight workers run for free. The coordinator delegates reasoning-heavy decisions to itself and volume tasks to local models. Same budget philosophy you'd apply to any team, senior talent on strategy, junior talent on execution.
Chapter 5
Every hard problem in this project mapped directly to a design principle I already use in product work. The skills that make great product designers transfer directly to AI systems architecture.
Each agent has an IDENTITY.md, a structured definition of personality, responsibilities, communication style, and boundaries. It's a design system for AI behavior. The same way design tokens prevent visual drift across a product, identity files prevent role drift across agents. Without them, agents slowly blend into generic assistants that overlap and conflict.
The single-coordinator pattern is information hierarchy applied to AI. One focal point. One decision-maker. Sub-agents report up, not sideways. It's the same reason a well-designed dashboard has one primary action per screen, clarity comes from constraint.
Agents don't dump raw data. They summarize, prioritize, and flag only what requires human attention. One daily Telegram recap instead of per-agent notifications. It's the same principle behind good notification design, respect the user's attention as a finite resource.
Markdown files (job-board.md, applications-log.md, bounties-log.md) serve as the system's single source of truth. Every agent reads from and writes to the same state files. No conflicting copies. No stale data. Same principle as a design system's token library, one source, many consumers.
Every roadblock taught the same lesson: assume things will break silently. GPU gets evicted? Silent fallback to paid API. VPN goes cold? Silent timeout. Model can't handle tools? Silent degradation. The system needed observability baked in, the same way good UX needs error states designed, not afterthought.
Chapter 6
This isn't a wrapper around ChatGPT. It's real infrastructure that I built, configured, and maintain.
OpenClaw
Open-source multi-agent platform. Session management, tool calling, cron scheduling, agent spawning.
Gemini 2.5 Flash
Google's fast reasoning model. Reliable multi-step tool calling. ~$0.02-0.08/call with thinking controls.
Ollama + gpt-oss:20b
Free inference on local GPU. Also running phi4:14b, deepseek-r1:32b, qwen2.5-coder:32b, qwen3-coder:30b.
RTX 3080 16GB VRAM
AMD Ryzen 9 5900HX, 32GB RAM. Running Ollama for model serving + ComfyUI for image generation.
Linux VPS
Runs OpenClaw gateway, agent sessions, cron scheduler. Connected to GPU rig via Tailscale VPN.
Tailscale VPN
Mesh VPN connecting VPS, GPU rig, and development machine. Private network, no exposed ports.
Next.js + pm2
Real-time monitoring dashboard. Agent status, cron health, activity feeds. Process-managed for uptime.
ComfyUI + FLUX Schnell
Local AI image generation pipeline. CLIP-L, T5-XXL, VAE. Runs on the same GPU rig.
Hybrid Architecture
Paid API for brain only. Free local models for workers. Token limits, thinking-level controls, concurrent session caps.
Chapter 7
Frodo is one system. But the problems it solves are universal.
Frodo proves that coordinator-pattern architecture, hybrid inference routing, and behavioral governance make autonomous AI systems viable for small teams and solo operators. The same patterns apply at enterprise scale.
Chapter 8
What started as an experiment became a fully operational system. Here's what it delivered.
What I built: A scalable coordinator architecture with hybrid inference routing, an AI identity and governance system that prevents role drift, a full observability layer for silent failure detection, and a cost engineering framework that reduced operating expenses by 95%. The system scales. Adding new agents requires no architectural redesign.
What's next: Browser automation for real-world execution. The infrastructure, coordination model, and agent team are production-ready. The next phase is giving agents the ability to interact directly with the open web, moving from analysis and generation into full autonomous action.
Chapter 9
I named the project after Frodo Baggins. One small person carrying something way too big for them, walking straight into the unknown anyway. That felt right when I started. It still does.
The hardest part of agentic AI isn't the technology. It's the system design. Defining clear roles, establishing communication protocols, knowing when to delegate versus decide, building for failure, designing information hierarchy so the right signal reaches the right person at the right time.
Orchestrating intelligence at scale is not an engineering trick. It is a design problem.