Engineering
September 5, 2025
The Complete Edge Architecture Guide (Part 2): How We Fit an AI Platform in 128MB
Effective collaboration is the backbone of any successful team, but too often, it’s slowed down by disconnected tools, endless email threads, and scattered information. Read on to learn more.
This is Part 2 of our four-part series on building AI-powered applications on the edge. [Part 1](./part-1-architecture.md) covered the foundational Cloudflare architecture. In this post, we'll dive into our framework choice and memory optimization strategies. Part 3 explores [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.
——
So you've decided to go all-in on Cloudflare Workers. You've got your R2 buckets, your queues, your Durable Objects. But now you need to actually build an API. And if you're like us, your first instinct is to reach for Express.
Here's the thing about running on the edge -- everything you know about Node.js frameworks goes out the window. That comfortable Express setup with its 572KB bundle size? It's going to cause a large memory overhead once you add your actual application code. Fastify with its plugin ecosystem? Those plugins assume Node.js APIs that don't exist in Workers.
Why Framework Choice Matters on the Edge
When you're running in a traditional Node.js environment, framework overhead is a rounding error. Your server has gigabytes of RAM, persistent connections, and all the time in the world to initialize. Bundle size? Who cares when you have a 100Mbps connection to your CDN.
But on the edge, every kilobyte counts. Every millisecond of initialization time is multiplied across thousands of cold starts. Every dependency is code that has to be parsed, compiled, and held in memory -- memory you desperately need for your actual application.
Consider this: Cloudflare Workers gives you 128MB of memory. Total. For everything. Your framework, your application code, your in-flight requests, your temporary data structures. Everything.
Now consider that Express alone -- just the framework, no middleware, no actual application -- uses 120MB of memory when running. See the problem?
The Express Problem: When 572KB Is Too Much
Let's be specific about why Express (and Fastify, and Koa, and Hapi) don't work on the edge:
Bundle Size Reality Check
The Polyfill Tax
Since Workers aren't Node.js, you need polyfills for Node.js-specific APIs. But here's the thing -- those polyfills aren't free:
By the time you've polyfilled enough to make Express happy, you've added 200KB+ to your bundle. For functionality you don't even need.
The Memory Problem
Express was designed for long-running servers. It caches routes, maintains middleware state, and builds up internal data structures over time. In a serverless environment where every request might be a cold start, this is pure waste:
Enter Hono: Built for the Edge
Hono takes a completely different approach. Instead of trying to be Node.js-compatible, it embraces web standards. Instead of bundling everything, it's modular. Instead of assuming persistent memory, it's stateless.
Express | Fastify | Koa | Hono | |
|---|---|---|---|---|
Bundle Size | 572KB | 189KB | 90KB | <14KB |
Memory Usage | 120MB | 85MB | 60MB | 18MB |
Cold Start | 450ms | 380ms | 210ms | 120ms |
Dependencies | 50+ | 30+ | 20+ | 0 |
Edge Workers | ❌ Needs adapter | ❌ Needs adapter | ❌ Needs adapter | ✅ Native |
But it's not just about being smaller. It's about being designed for this environment.
Web Standards First
Hono is built entirely on Web Standards APIs -- the same APIs that Cloudflare Workers implements natively:
Zero Dependencies
Everything Hono needs is either part of the web standards (which Workers provides) or bundled in that tiny 14KB package. No dependency hell. No security vulnerabilities from nested dependencies. No surprises.
TypeScript Native
While Express requires @types/express (and prayers that they match the actual version), Hono is written in TypeScript:
Cloudflare Bindings Integration
As if the efficiency benefits weren't enough: direct, type-safe access to all Cloudflare services.
Middleware Composition
Hono's middleware system is both powerful and efficient:
Dynamic Loading: Managing Memory in a 128MB World
Here's a reality check -- Cloudflare Workers gives you 128MB of memory per execution. That's it. When you're running an AI orchestration framework (Mastra), handling 50+ route modules, and processing real-time data, that 128MB disappears fast.
So how do we make it work? Dynamic loading.
The Problem: Everything Everywhere All at Once
In a traditional Node.js app with Express, you'd import everything at startup:
Do this in Workers and you'll blow through your memory limit before handling a single request.
Our Solution: Load Only What You Need, When You Need It
We implemented a dynamic loading system that treats memory as the precious resource it is:
Per-Request AI Initialization
Here's the clever bit -- Mastra (our AI framework) isn't initialized globally. It's created on-demand, only for routes that actually need it:
The Route Module System
Our route module system is configuration-driven and optimized for edge constraints:
Route Configuration
The Loading Strategy
Routes That Skip AI Initialization
Most routes don't need Mastra at all:
Authentication (`/auth/*`): JWT validation, session management
Organizations (`/organizations/*`): CRUD operations
Repositories (`/repositories/*`): GitHub configuration
Billing (`/billing/*`): Stripe integration
API Keys (`/api-keys/*`): Key management
Notifications (`/notifications/*`): Preference management
Health Checks (`/health/*`): System status
Only these routes initialize Mastra:
Chat (`/chat/*`): AI-powered conversations
Bug Analysis (`/bug-analysis/*`): Chrome extension analysis
Analytics Enrichment (`/v1/analytics-enrichment/*`): AI insights
GitHub Workflow (queue consumers): Event processing
Production Patterns That Emerged
After six months in production, here are the patterns that actually work:
1. Lazy Everything
2. Request-Scoped Initialization
3. Granular Module Boundaries
4. Memory-Aware Caching
Lessons Learned
After building a production AI platform with Hono and dynamic loading, here's what we learned:
1. Constraints Drive Innovation
The 128MB limit seemed impossible at first. But it forced us to build a better architecture than we would have with unlimited memory. Every constraint became an opportunity to optimize.
2. Not All Frameworks Are Created Equal
Framework choice matters exponentially more on the edge than on traditional servers. The difference between Hono and Express isn't incremental -- it's the difference between working and not working.
3. Dynamic Loading Is a Superpower
Loading code on-demand isn't just about saving memory. It's about:- Faster cold starts (less to parse)- Better isolation (routes don't interfere)- Easier debugging (smaller surface area)- Natural code splitting (enforced boundaries)
4. Type Safety Enables Velocity
Hono's TypeScript-first design with CloudflareEnv bindings caught so many bugs at compile time. When your bindings are typed, your routes are typed, and your responses are typed, you ship with confidence.
5. Web Standards Are the Future
By building on Request/Response instead of Node.js APIs, our code is portable. We could move to Deno, Bun, or any future runtime that implements web standards. No lock-in.
The Bottom Line
Hono + dynamic loading let us fit an entire AI platform -- 50+ endpoints, real-time chat, workflow orchestration, semantic search -- into 128MB of memory with sub-50ms response times globally.
Could we have built this with Express? No. The memory constraints alone would have killed the project.
Could we have built it with another edge framework? Maybe, but none match Hono's combination of performance, size, and developer experience.
The future of web development isn't just about moving to the edge. It's about embracing the constraints of the edge and using frameworks designed for this new world. For us, that framework is Hono.
——
Next in the Series
**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.
——
_Want to see Hono in action? Check out our [open-source code](https://github.com/kasava/kasava) or read more about [our parallel indexing architecture](/blog/parallel-indexing-architecture) that processes 10,000+ files in under 5 minutes._


