Solutions Engineering Study Guide

Cloudflare Developer Platform

Fundamentals, architectural patterns, runnable code, and customer talk tracks for the SaaS, ISV, and "build on the edge" conversation.

Audience: Solutions Engineers Format: Self-contained reference Sourced from official Cloudflare docs

1. Why the Developer Platform Exists

Most cloud platforms ship a region. Cloudflare ships a network. The Developer Platform is the set of primitives that lets customers run code, store data, and do AI inference inside that network — no region selection, no idle servers, no egress fees.

The three problems it solves for customers

Latency. Code and data already live close to the end user. No "us-east-1 → São Paulo" round trip.
Operational tax. No VPCs, no autoscaling groups, no Kubernetes operators. The platform is the ops team.
Egress economics. R2 has zero egress. Workers and storage are pay-per-request. The cost model rewards usage, not provisioning.

How Cloudflare delivers it

V8 isolates, not containers. <1ms cold starts. A Worker is a single JS/TS module evaluated once and reused.
One global deployment. wrangler deploy publishes to 300+ cities at once. There is no "region."
Bindings, not SDKs. Storage and services are first-class objects on env. No connection strings, no IAM gymnastics.
Web standards. fetch, Request, Response, URL, Headers, WebSocket, Streams — code is portable.

SE framing When a customer says "I want to move workloads closer to users," they may not realize they're describing the Developer Platform. The pitch is not "another serverless." It's "the network is the runtime."

2. The Mental Model: Compute + Storage + AI

Group the platform into three families. Customers always combine pieces from each.

Compute

Workers, Pages, Durable Objects, Workflows, Queues, Cron Triggers, Containers.

Storage / Data

KV, D1, R2, Hyperdrive, DO Storage (SQLite), Queues, Vectorize, Secrets Store.

AI

Workers AI (50+ models), Vectorize, AI Gateway, AI Search, Agents SDK.

Decision tree for picking storage

Need	Pick	Why
High-read config / sessions / feature flags	KV	<10ms global reads, eventually consistent
Relational data, joins, transactions	D1	SQLite per database, 10 GB, Time Travel recovery
Files, video, backups, S3 migration	R2	S3 API, zero egress, strong consistency
Strong consistency on a single entity (room, user, document)	Durable Objects	Single-threaded, co-located storage, ~1K req/s
Async jobs / event fan-out	Queues	At-least-once delivery, batching, DLQ
Vector embeddings (RAG / search)	Vectorize	10M vectors/index, cosine/euclidean/dot
Existing Postgres / MySQL	Hyperdrive	Connection pooling + caching to your DB

Decision tree for picking compute

Need	Pick
HTTP API / edge logic / auth	Workers
Full-stack web app with Git deploys	Pages (or Workers Static Assets)
Real-time coordination (chat, presence, locks)	Durable Objects
Multi-step durable jobs (minutes → weeks)	Workflows
Async fan-out	Queues
Scheduled tasks	Cron Triggers
Long-running stateful processes / heavy deps	Containers

3. Workers — the Compute Primitive

A Worker is a JS/TS/Rust/Python module deployed globally that exports handlers. The most common is fetch — invoked on every HTTP request to your route.

What a Worker actually is

A Worker is a piece of code that runs inside Cloudflare's edge network instead of on a server you rent or operate. When a user hits your domain, the request lands at the nearest Cloudflare data center (one of 300+ globally), and a Worker is invoked in that data center. There is no region, no autoscaling group, no load balancer to configure — the network is the runtime.

Under the hood, Workers run on V8 isolates, the same lightweight sandbox technology that powers each browser tab in Chrome. An isolate is not a container, not a VM, not a process — it's a small slice of a long-running V8 instance. Cloudflare can boot thousands of them per second on a single machine, which is why cold starts are measured in microseconds rather than seconds, and why the platform doesn't bill you for idle time.

The execution model

Every Worker request executes in three distinct phases:

Request lands at the edge. Cloudflare's anycast network routes the user to the closest PoP. The Worker isolate is either already warm (most cases) or cold-started in <1ms.
Your handler runs. The default export's fetch(request, env, ctx) method is invoked. request is a standard Request object, env contains your bindings (more on this below), and ctx is the execution context.
Response returns + background work continues. Your Response goes back to the user. Anything passed to ctx.waitUntil() keeps running after the response — perfect for logging, cache warming, queue sends, or analytics.

Bindings — the killer feature

A binding is a typed reference to another Cloudflare resource (a KV namespace, a D1 database, an R2 bucket, an AI model, another Worker) that appears as a property on env. There are no connection strings, no API keys to manage, no SDKs to import — the platform wires it up at deploy time.

This matters because most cloud security incidents come from leaked credentials or misconfigured IAM. With bindings, the Worker only has access to what you explicitly grant in wrangler.jsonc, and that grant is the credential. Customers used to AWS will recognize this as "IAM done right."

Web standards, not Node.js

Workers implement the Web Platform APIs — fetch, Request, Response, Headers, URL, WebSocket, Streams, crypto.subtle, TextEncoder. The same APIs you'd use in a browser. This is deliberate: code written for Workers is portable, and developers don't have to learn a proprietary runtime.

Node.js built-ins (fs, net, buffer, etc.) are not available by default, because Workers don't have a filesystem or persistent process. If a customer needs them (often for npm package compatibility), enable the nodejs_compat compatibility flag — it adds a polyfilled subset.

Languages supported

JavaScript and TypeScript are first-class. Python, Rust, and any language compilable to WebAssembly (Go, C, C++, .NET via Blazor) are also supported. Most production Workers are TypeScript because the binding type system is best-in-class there.

Pricing & performance posture

Workers are billed on requests + CPU time, not wall-clock time. A Worker that's awaiting a slow database call is not billing you for those milliseconds — only the time V8 actually spent executing your code. This is a fundamentally different cost model from Lambda or Cloud Run, and it tends to be 5–20x cheaper for typical edge workloads.

Fundamentals — quick reference

Runs on V8 isolates — no containers, no VMs. Cold start <1ms.
Three handler arguments: request, env (bindings), ctx (execution context).
Use ctx.waitUntil(promise) to keep work alive after the response is sent (logging, cache writes, queue sends).
Web platform APIs only — no Node built-ins by default. Enable nodejs_compat flag if you need them.
Other handlers: scheduled (cron), queue (queue consumer), tail (log stream), email (Email Workers).

Hello World

typescript — src/index.ts

export interface Env {
  MY_KV: KVNamespace;
  DB: D1Database;
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname === "/health") {
      return Response.json({ ok: true, ts: Date.now() });
    }

    return new Response("Hello from the edge", { status: 200 });
  },
} satisfies ExportedHandler<Env>;

wrangler.jsonc — the deployment manifest

jsonc — wrangler.jsonc

{
  "$schema": "node_modules/wrangler/config-schema.json",
  "name": "edge-api",
  "main": "src/index.ts",
  "compatibility_date": "2026-01-01",
  "compatibility_flags": ["nodejs_compat"],
  "observability": { "enabled": true },

  "kv_namespaces": [
    { "binding": "MY_KV", "id": "abc123..." }
  ],
  "d1_databases": [
    { "binding": "DB", "database_name": "app", "database_id": "..." }
  ],
  "r2_buckets": [
    { "binding": "FILES", "bucket_name": "uploads" }
  ],
  "ai": { "binding": "AI" }
}

Essential commands

bash

npm create cloudflare@latest my-worker -- --type hello-world
cd my-worker

npx wrangler dev                        # local dev
npx wrangler dev --remote               # use real bindings
npx wrangler deploy                     # publish globally
npx wrangler tail                       # stream live logs
npx wrangler secret put STRIPE_KEY      # set encrypted secret

Customer signal "We need an API gateway / BFF / auth layer / rate limiter in front of our origin." → Workers is almost always the right answer. It's the front door.

4. Building APIs with Hono

Hono is the de facto Workers-native web framework. It's tiny, type-safe, and feels like Express. Use it any time you have more than two routes.

Why a framework matters here

The raw Workers API gives you a single fetch handler that receives a Request. For any non-trivial API, you'll quickly want routing, middleware, body parsing, validation, error handling, and response helpers. You can write all of that yourself, but every team ends up rebuilding the same primitives.

Hono is the framework Cloudflare itself recommends. It was built for edge runtimes from day one (not retrofitted from Node), so it has zero dependencies, sub-millisecond router latency, and a tiny bundle size that doesn't eat your Worker's startup budget.

What you get

RegExpRouter — one of the fastest router implementations in any language. Trie-based, compiled at startup.
Typed environment — your bindings flow through c.env with full TypeScript inference.
Middleware ecosystem — built-in CORS, logger, JWT, basic auth, cache, compression, ETag, secure headers, CSRF.
Validators — Zod, Valibot, and TypeBox integrations turn invalid requests into automatic 400 responses.
RPC mode — share types between your Worker and your frontend client, end-to-end type safety without code generation.

When to reach for it

Almost always. The exceptions: a tiny single-purpose Worker (one route, no body parsing) or a pure proxy. For any SaaS backend, BFF, or public API, Hono pays for itself within an hour.

Alternatives worth knowing

itty-router — even smaller, less ergonomic, fine for very small APIs. Hattip — adapter layer that lets you target multiple edge runtimes. For full-stack apps with frontend frameworks, you'd usually use the framework's own router (Next.js, SvelteKit, Astro, Remix) and reserve Hono for standalone APIs.

typescript — src/index.ts

import { Hono } from "hono";
import { cors } from "hono/cors";
import { logger } from "hono/logger";
import { zValidator } from "@hono/zod-validator";
import { z } from "zod";

type Bindings = {
  DB: D1Database;
  MY_KV: KVNamespace;
};

const app = new Hono<{ Bindings: Bindings }>();

app.use("*", logger());
app.use("/api/*", cors({ origin: "*" }));

// Auth middleware
app.use("/api/admin/*", async (c, next) => {
  const auth = c.req.header("Authorization");
  if (auth !== `Bearer ${c.env.MY_KV.get("admin-token")}`) {
    return c.json({ error: "unauthorized" }, 401);
  }
  await next();
});

// Validated POST
const createUser = z.object({
  email: z.string().email(),
  name: z.string().min(1),
});

app.post("/api/users", zValidator("json", createUser), async (c) => {
  const { email, name } = c.req.valid("json");
  const result = await c.env.DB
    .prepare("INSERT INTO users (email, name) VALUES (?, ?) RETURNING id")
    .bind(email, name)
    .first<{ id: number }>();

  return c.json({ id: result?.id, email, name }, 201);
});

app.onError((err, c) => {
  console.error(err);
  return c.json({ error: err.message }, 500);
});

export default app;

Pattern Hono + Zod + D1 + KV is the standard "edge API" stack. If a customer is building a SaaS backend, this is your demo template.

5. KV — Edge Key-Value

Globally distributed, eventually consistent KV store. Read-optimized — values cached at the edge after first read. Best for config, sessions, feature flags, and read-heavy caches.

What KV actually is

KV is best understood as a read-through cache layered on top of a globally replicated store. The authoritative copy of every key lives in a small number of central data centers. The first time a Worker reads a key in a given PoP, KV fetches it from the central store and caches it in that PoP's local memory. Subsequent reads from the same PoP are essentially free — sub-10ms, often sub-millisecond.

This architecture is why KV is so fast for reads and why writes take time to propagate: a write must invalidate or refresh that cached copy in every PoP that's seen the key. Up to 60 seconds globally is normal.

The consistency model — why this matters

KV is eventually consistent. Concretely:

A write you just made may not be visible from another PoP for up to 60 seconds.
Two writes to the same key from two different PoPs race — last writer wins, and "last" depends on which write the central store sees last, not wall-clock order.
There's a hard write rate limit of 1 write per second per key. Hot keys will return 429.

This is the most common KV mistake: customers reach for it because it's fast, and then try to use it as a primary store for things like counters, rate limits, or transactional state. Those workloads need Durable Objects or D1. KV is for data that's read 1000x more often than it's written.

The right mental model: "config + cache"

Configuration data — feature flags, A/B test buckets, routing rules, allowlists, tenant config.
Cached read-through — a slow upstream API or a heavy database query, cached at the edge.
Session data — JWT denylists, login sessions (read on every request, written rarely).
Static content metadata — page metadata, redirects, sitemaps.

What it's not

Not a primary database. Use D1 for relational, R2 for blobs, DO for strongly-consistent state.
Not a transactional store. There are no transactions, no compare-and-swap, no atomicity across keys.
Not a real-time queue. Writes propagate eventually; that's a feature, not a bug.

Metadata — the underappreciated feature

Every KV value can carry up to 1024 bytes of metadata that's returned alongside the value in a single round trip. This is perfect for storing small extras (TTL hints, version numbers, content type) without an extra round trip. list() returns metadata too, so you can paginate over a prefix and read attributes without fetching every value.

Characteristics that matter to customers

Read latency: <10ms once warm in a PoP, sub-ms hot.
Write propagation: Up to 60s globally. Don't use KV as a write-heavy primary store.
Per-key write rate: 1 write/second. 429 above that.
Value size: 25 MiB max. Metadata: 1024 bytes.
Storage cost: Cheap. Reads are billed per-million; writes/deletes/list are billed at a higher per-million rate.

Setup & basic ops

bash

wrangler kv namespace create CONFIG
# Add the returned id to wrangler.jsonc under kv_namespaces

typescript

// Write with a 5-minute TTL
await env.CONFIG.put("flag:dark-mode", JSON.stringify({ enabled: true }), {
  expirationTtl: 300,
});

// Read as JSON
const flag = await env.CONFIG.get<{ enabled: boolean }>("flag:dark-mode", "json");

// List with a prefix
const { keys } = await env.CONFIG.list({ prefix: "flag:" });

// Read with metadata (avoid double round-trip)
const { value, metadata } = await env.CONFIG.getWithMetadata<Session, SessionMeta>(
  `session:${sid}`,
  "json"
);

Pattern: feature flag with cached evaluation

typescript

app.get("/api/flags/:key", async (c) => {
  const key = c.req.param("key");
  // cacheTtl avoids a KV roundtrip on every request in this PoP
  const value = await c.env.CONFIG.get(`flag:${key}`, {
    type: "json",
    cacheTtl: 60,
  });
  return c.json({ key, value });
});

6. D1 — Serverless SQL

SQLite-compatible serverless database. Best for relational application data: users, accounts, posts, audit logs. 10 GB per database, 30-day Time Travel point-in-time recovery, optional read replicas.

Why SQLite, and why on the edge

Most cloud SQL offerings (RDS Postgres, Aurora, Cloud SQL) optimize for one big database serving many applications. D1 inverts that: it's optimized for many small databases, one per tenant or per application. The underlying engine is SQLite — the most-deployed database in the world, embedded in every iPhone, Android device, and browser. It's battle-tested, has world-class SQL semantics, and runs in-process (no network round trip between query and engine).

Cloudflare took SQLite, wrapped it in a managed service with replication, durability, point-in-time recovery, and edge access, and exposed it as a Worker binding. The result is a database that feels like Postgres for the developer but scales horizontally by sharding databases (per-tenant, per-region) instead of vertically.

The horizontal-scale philosophy

One D1 database is capped at 10 GB and ~1K writes/sec. That sounds limiting until you realize the design intent: you're supposed to run many of them. A SaaS platform with 10,000 tenants might run 10,000 D1 databases — one per tenant — each with its own isolated schema, performance, and data residency. Cloudflare bills you on rows read/written and storage, not per database.

This pattern (one DB per tenant) is genuinely hard on traditional cloud SQL because of connection limits, provisioning overhead, and cost per instance. On D1 it's the happy path.

Time Travel — point-in-time recovery built in

Every D1 database has a 30-day continuous backup. You can restore to any second within that window with wrangler d1 time-travel restore. There's no setup, no extra cost, no separate snapshot service. For customers used to managing RDS automated snapshots and PITR windows, this alone is often the moment they get sold.

Read replication (paid plans)

Writes go to a single primary; D1 can place read replicas in regions you specify. Workers automatically read from the nearest replica with automatic failover to the primary if needed. The Sessions API ensures read-your-writes consistency within a session — critical for "user updates profile, immediately reloads page" flows.

What it's good for vs. not

Good for: relational application data, multi-tenant SaaS (one DB per tenant), audit logs, structured analytics with modest write volume, anything you'd reach for Postgres for.
Not for: very high write throughput on a single database (>1K w/s sustained), single-database datasets >10 GB, complex Postgres-specific extensions (PostGIS, pgvector — use Vectorize instead), connection-pooled access from outside Cloudflare (use Hyperdrive over an external Postgres for that).

SQL compatibility

D1 supports the full SQLite SQL dialect: CTEs, window functions, JSON functions, full-text search via FTS5, generated columns, partial indexes. It does not support stored procedures, triggers with side effects outside the DB, or Postgres-specific syntax. ORMs that work with SQLite (Drizzle, Kysely, Prisma) work with D1.

Schema + migrations

bash

wrangler d1 create app
wrangler d1 migrations create app init_schema
# edits ./migrations/0001_init_schema.sql
wrangler d1 migrations apply app --remote

sql — migrations/0001_init_schema.sql

CREATE TABLE users (
  id        INTEGER PRIMARY KEY AUTOINCREMENT,
  email     TEXT NOT NULL UNIQUE,
  name      TEXT NOT NULL,
  created_at INTEGER NOT NULL DEFAULT (unixepoch())
);
CREATE INDEX idx_users_email ON users(email);

CREATE TABLE posts (
  id       INTEGER PRIMARY KEY AUTOINCREMENT,
  user_id  INTEGER NOT NULL REFERENCES users(id),
  title    TEXT NOT NULL,
  body     TEXT NOT NULL,
  created_at INTEGER NOT NULL DEFAULT (unixepoch())
);

Querying — prepared statements only

typescript

// .first() — single row
const user = await env.DB
  .prepare("SELECT id, email, name FROM users WHERE email = ?")
  .bind(email)
  .first<{ id: number; email: string; name: string }>();

// .all() — many rows
const { results } = await env.DB
  .prepare("SELECT * FROM posts WHERE user_id = ? ORDER BY created_at DESC LIMIT 20")
  .bind(userId)
  .all<Post>();

// .run() — INSERT/UPDATE/DELETE
const { meta } = await env.DB
  .prepare("INSERT INTO posts (user_id, title, body) VALUES (?, ?, ?)")
  .bind(userId, title, body)
  .run();
console.log("inserted id:", meta.last_row_id);

// Batch — atomic transaction in one round trip
await env.DB.batch([
  env.DB.prepare("UPDATE users SET name = ? WHERE id = ?").bind(name, id),
  env.DB.prepare("INSERT INTO audit_log (user_id, action) VALUES (?, ?)").bind(id, "rename"),
]);

Always parameterize Never interpolate user input into SQL strings. Use .bind(). The platform supports prepared statements for a reason.

7. R2 — Object Storage (Zero Egress)

S3-compatible object storage with zero egress fees. Strong consistency on writes and deletes. Use cases: user uploads, media libraries, backups, data lakes, static assets, S3 migration targets.

The "zero egress" thesis

Cloud object storage has historically had a brutal economic asymmetry: storing a TB is cheap (~$23/mo on S3 Standard), but reading it out to the internet costs $0.05–$0.09 per GB. For media-heavy businesses (video, images, downloads, ML training data, backups), egress can dwarf storage costs by 10x or more.

R2's pricing model removes egress fees entirely. You pay for storage and for operations (Class A — writes/lists, Class B — reads), but bytes leaving R2 to the internet, to your origin, or to a customer's browser are free. This is enabled by Cloudflare's network: outbound bandwidth is already paid for by the broader CDN business, so R2 doesn't need to recoup it.

S3 compatibility — drop-in migration

R2 implements the S3 REST API. Most S3 SDKs and tools (AWS SDK, boto3, s3cmd, rclone, Terraform) work by changing the endpoint URL and credentials. The migration story for an S3 customer is therefore very gentle: point your existing code at R2, optionally use Cloudflare's Super Slurper tool to bulk-copy your existing buckets, and start saving on egress immediately.

For new code on Cloudflare, you'd typically use the native Workers binding (env.MY_BUCKET.put/get/list) instead of the S3 SDK — it's faster, has no auth overhead, and offers some R2-specific features (multipart with smaller minimum, conditional requests via ETags).

Consistency model

R2 is strongly consistent for reads, writes, and deletes. After put() resolves, every subsequent get() sees the new value. After delete(), every get() returns 404. This is a meaningful step up from S3, which has had read-after-write strong consistency since 2020 but historically dealt with various caveats.

Storage classes

Standard — low-latency, frequently accessed. The default.
Infrequent Access — cheaper storage, retrieval fees, 30-day minimum storage duration. Good for backups, archives, and "long tail" content.

You can set lifecycle rules to automatically transition objects between classes (e.g. "move to IA after 60 days").

Event notifications — making R2 reactive

R2 can emit events to a Cloudflare Queue when objects are created or deleted. This turns R2 into the trigger for an event-driven pipeline — exactly the same pattern as S3 + SQS + Lambda, but native to one platform. Common uses: thumbnail/transcoding jobs on upload, virus scanning, indexing into Vectorize, audit logging.

Public buckets & custom domains

You can expose an R2 bucket directly via a public URL or attach it to a custom domain. Combined with Cloudflare's CDN cache (free), this turns R2 into a high-performance static asset host with no egress, no separate CDN configuration, and full Cache Rules / Workers Transform Rules in front.

What it's good for

User uploads (images, video, documents).
Media libraries and CDN origins.
Database backups, log archives, data lakes.
ML training datasets and model artifacts.
Build artifacts and software downloads.
Static site assets when paired with Workers.

What it's not

Not a filesystem — no append, no in-place edit, no rename. Objects are immutable; "edit" means re-upload.
Not a database — no querying inside objects (use R2 Data Catalog + R2 SQL for parquet/iceberg analytics).
Not a block store — single-object reads pull whole objects (or ranges) over HTTP.

Bucket creation + Worker upload/download

bash

wrangler r2 bucket create uploads --location=enam

typescript

// Upload from a multipart form
app.post("/api/upload", async (c) => {
  const form = await c.req.formData();
  const file = form.get("file") as File | null;
  if (!file) return c.json({ error: "no file" }, 400);

  const key = `${crypto.randomUUID()}/${file.name}`;
  await c.env.FILES.put(key, file.stream(), {
    httpMetadata: { contentType: file.type },
    customMetadata: { uploadedBy: c.get("userId") },
  });

  return c.json({ key, size: file.size });
});

// Stream download
app.get("/api/files/:key{.+}", async (c) => {
  const obj = await c.env.FILES.get(c.req.param("key"));
  if (!obj) return c.notFound();

  const headers = new Headers();
  obj.writeHttpMetadata(headers);
  headers.set("etag", obj.httpEtag);
  return new Response(obj.body, { headers });
});

R2 → Queue event notifications

jsonc — wrangler.jsonc

{
  "r2_buckets": [{ "binding": "FILES", "bucket_name": "uploads" }],
  "queues": {
    "producers": [{ "binding": "PROCESS_QUEUE", "queue": "process-uploads" }]
  }
}
// Then: wrangler r2 bucket notification create uploads \
//        --queue process-uploads --event-type object-create

Migration play For S3 customers, R2 + Super Slurper (built-in S3 migration) often pays for itself in egress savings within months. Always quantify their current egress bill on the discovery call.

8. Durable Objects — Stateful Coordination

A Durable Object is a globally-unique, single-threaded, stateful actor with co-located storage. Use them when many clients need to coordinate around one thing: a chatroom, a document, a user's session, a rate-limit counter.

The problem DOs solve

Stateless serverless functions are great for stateless work, but most real apps have shared state — a chat room every participant writes to, a document several users edit at once, a counter that increments atomically, a rate-limit window per API key. Traditional serverless punts this state to an external database, which means every interaction becomes a round trip to a region, and you fight race conditions with optimistic locking, transactions, or distributed locks.

Durable Objects collapse the compute and the state into one entity. For a given ID, there is exactly one DO instance running anywhere in the world at any moment. All requests for that ID are routed to that instance and processed in serial order. There are no race conditions because there's no concurrency. The state lives in memory and in co-located storage on the same machine.

The actor model, brought to the edge

Conceptually, a DO is an actor — a tiny stateful service identified by name, addressable from anywhere, that processes one message at a time. This is the same model as Erlang processes, Akka actors, or Orleans grains, but as a managed serverless primitive on Cloudflare's network.

You spawn a DO by name (idFromName("room:lobby")) or by a unique ID (newUniqueId()). The first time anyone references that ID, Cloudflare instantiates the object near where the first request originated. From then on, every request for that ID lands on that same instance — providing strong consistency for free.

Storage: SQLite inside every DO

Each modern DO comes with its own embedded SQLite database, accessible via synchronous APIs (no await needed for reads — they're literal microseconds). You can also use the simpler key-value API on top of the same SQLite engine. Storage is up to 10 GB per DO, durable, and replicated. There's also a 30-day point-in-time recovery option.

Because storage is co-located, a DO can read and write its own state in microseconds — no network hop. This is what makes them fast enough for real-time use cases that would be impossible with an external database.

WebSocket Hibernation

DOs are the only Cloudflare primitive that can hold open WebSocket connections. The Hibernation API lets a DO accept thousands of WebSockets and then go idle — Cloudflare evicts the in-memory state, but the connections stay open and "wake" the DO when a message arrives. You're billed for storage and active compute, not for sitting around waiting. This makes DOs uniquely suited for chat, multiplayer, collaborative editing, and IoT fan-in.

Alarms — durable scheduled wake-ups

A DO can schedule itself to wake up at a future time via setAlarm(). The alarm is durable (survives evictions, restarts, hibernation). This gives you per-entity scheduling: "remind this user in 24 hours," "expire this session at 5pm UTC," "retry this charge in 1 hour." There's a single alarm slot per DO; for multiple events use a queue pattern internally.

Sharding above 1K req/s

The serial execution that makes DOs strongly consistent also caps a single instance at roughly 1,000 requests/second. For higher throughput, you shard: instead of one global counter, run 100 counter shards (idFromName("counter:" + Math.floor(Math.random() * 100))) and aggregate. This is a deliberate design trade — strong consistency per shard, eventual consistency across shards.

When to choose DOs

The mental test: "Is there a single thing that many clients touch at once, and does the order of those touches matter?" If yes, DO. Examples:

Chat rooms / channels
Collaborative documents (Google Docs-style)
Multiplayer game rooms
Per-user session state with strong consistency
Per-tenant rate limiters and quotas
Auctions / leaderboards / live events
Booking systems / inventory locks
IoT device shadows

Three things to know

One DO instance per ID. All requests for that ID land on the same isolate, in serial order. No race conditions.
Storage is co-located. SQLite (recommended) or KV — both run inside the DO with sub-ms reads.
Throughput ceiling: ~1K req/s per DO. Above that, shard with multiple IDs.

Counter — the "hello world" of DOs

typescript — src/index.ts

import { DurableObject } from "cloudflare:workers";

export class Counter extends DurableObject<Env> {
  async increment(): Promise<number> {
    const row = this.ctx.storage.sql.exec(
      `INSERT INTO counters (id, value) VALUES (1, 1)
       ON CONFLICT(id) DO UPDATE SET value = value + 1
       RETURNING value`
    ).one<{ value: number }>();
    return row.value;
  }
}

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const id = env.COUNTER.idFromName("global");
    const stub = env.COUNTER.get(id);
    const count = await stub.increment();   // RPC — typed, no fetch needed
    return Response.json({ count });
  },
} satisfies ExportedHandler<Env>;

jsonc — wrangler.jsonc

{
  "durable_objects": {
    "bindings": [{ "name": "COUNTER", "class_name": "Counter" }]
  },
  "migrations": [
    { "tag": "v1", "new_sqlite_classes": ["Counter"] }
  ]
}

Real-time chat room with WebSocket Hibernation

typescript

export class ChatRoom extends DurableObject<Env> {
  async fetch(request: Request): Promise<Response> {
    if (request.headers.get("Upgrade") !== "websocket") {
      return new Response("expected ws", { status: 426 });
    }
    const pair = new WebSocketPair();
    const [client, server] = Object.values(pair);

    // Hibernation API — zero compute cost while idle
    this.ctx.acceptWebSocket(server);

    return new Response(null, { status: 101, webSocket: client });
  }

  webSocketMessage(ws: WebSocket, msg: string) {
    // Broadcast to all connected clients
    for (const peer of this.ctx.getWebSockets()) {
      peer.send(msg);
    }
  }

  webSocketClose(ws: WebSocket, code: number, reason: string) {
    ws.close(code, reason);
  }
}

Customer signal "Multiplayer," "collaborative editing," "presence," "leaderboard," "per-tenant rate limit," "auction," "booking system" — all DO-shaped problems. The mental key is "a thing many clients touch at once."

9. Queues — Async Processing

Push messages from a Worker, consume them in batches in another Worker. At-least-once delivery, configurable retries, dead-letter queues, delays up to 12 hours.

What problem queues solve

Some work shouldn't happen on the request path: sending email, generating thumbnails, running ML inference, calling a slow third-party API, fanning out a webhook to 100 subscribers. Doing it inline blocks the user, magnifies failure surface, and ties the work's lifetime to the request's lifetime.

A queue decouples the producer (your API endpoint, which just enqueues the work) from the consumer (a separate Worker that processes it later). The producer returns 202 immediately; the consumer takes its time, retries on failure, and dead-letters anything it can't handle.

Delivery guarantees — at-least-once

Cloudflare Queues guarantees that every message is delivered at least once to a consumer. In rare failure scenarios it may be delivered more than once. This means your consumer logic must be idempotent — processing the same message twice should produce the same result as processing it once. The standard pattern is to use a unique message ID (or a hash of the payload) and check against a "seen" record in D1 or KV before doing the side effect.

Batching — the throughput lever

Consumers receive messages in batches, not one at a time. You configure two knobs: max_batch_size (e.g. 25 messages) and max_batch_timeout (e.g. 5 seconds). The consumer fires when either limit is hit. Batching dramatically improves throughput for downstream operations — one D1 batch insert beats 25 individual ones, one external API call with 25 items beats 25 single calls.

Retries and dead-letter queues

If a message fails (you call msg.retry() or your handler throws), Queues will redeliver it after a configurable delay, up to max_retries times. After that, the message goes to a dead-letter queue (DLQ) — another queue you've designated to capture poison pills for inspection. DLQs are critical for production: without one, a malformed message can retry forever, burning your CPU budget and clogging the queue.

Push vs. pull consumers

Push (Worker consumer) — Cloudflare invokes your Worker's queue() handler with each batch. The default and most common pattern.
Pull (HTTP consumer) — your external system polls the queue's HTTP API for messages. Useful when the consumer can't run on Workers (e.g. a legacy on-prem system).

Delays — scheduling work into the future

You can publish a message with a delay of up to 12 hours, or retry with a delay. This turns Queues into a lightweight scheduler for short-horizon work ("retry this in 5 minutes," "send this notification in 2 hours"). For longer-horizon or multi-step work, use Workflows.

Common patterns

Async fan-out — one event produces N downstream actions (welcome email + analytics event + webhook + index update).
Buffering / smoothing — absorb spikes from a webhook and drain at a controlled rate.
Cross-system bridge — R2 event notifications → Queue → Worker → D1 / Vectorize.
Retry backbone — wrap any flaky external call in a queue with exponential backoff.

Setup

bash

wrangler queues create email-jobs
wrangler queues create email-jobs-dlq

jsonc — wrangler.jsonc

{
  "queues": {
    "producers": [
      { "binding": "EMAIL_QUEUE", "queue": "email-jobs" }
    ],
    "consumers": [
      {
        "queue": "email-jobs",
        "max_batch_size": 25,
        "max_batch_timeout": 5,
        "max_retries": 3,
        "dead_letter_queue": "email-jobs-dlq"
      }
    ]
  }
}

Producer + consumer in one Worker

typescript

type EmailJob = { to: string; template: string; vars: Record<string, string> };

// Producer (called from your API)
app.post("/api/signup", async (c) => {
  const { email } = await c.req.json();
  // ... create user in D1 ...
  await c.env.EMAIL_QUEUE.send({
    to: email,
    template: "welcome",
    vars: { name: email.split("@")[0] },
  } satisfies EmailJob);
  return c.json({ ok: true }, 202);
});

// Consumer
export default {
  fetch: app.fetch,
  async queue(batch: MessageBatch<EmailJob>, env: Env): Promise<void> {
    for (const msg of batch.messages) {
      try {
        await sendEmail(env, msg.body);
        msg.ack();
      } catch (err) {
        // Retry with backoff; after max_retries → DLQ
        msg.retry({ delaySeconds: Math.min(60 * msg.attempts, 600) });
      }
    }
  },
} satisfies ExportedHandler<Env, EmailJob>;

Critical gotcha An uncaught error inside queue() retries the entire batch, not just the failed message. Always use per-message try/catch and explicitly ack() or retry().

10. Workflows — Durable Multi-Step Jobs

When a job has multiple steps that can fail independently, takes longer than a single request, or needs to wait for an external event — Workflows. Steps are individually retried; successful steps are not re-run.

The problem domain

Some business processes don't fit in a single request or a single queue message. They're multi-step (call the payment processor → wait for confirmation → write to DB → send receipt → schedule reminder), long-running (3-day onboarding email sequence, 30-day trial expiration), or conditional on external events (wait for the user to confirm their email, wait for a webhook from Stripe).

Building this on raw queues and crons gets ugly fast. You end up reinventing checkpointing, retry state, idempotency keys, and timeouts in a database. AWS calls this category "Step Functions"; Temporal and Inngest exist as standalone vendors. Cloudflare Workflows is the same primitive, baked into the platform.

The durable execution model

A Workflow is a class with a run() method that calls step.do() for each unit of work. Each step's return value is persisted. If the Worker crashes, the machine is rebooted, or a step fails and retries, only the failed step re-runs — the successful ones replay from the persisted log.

This means you can write what looks like ordinary sequential code:

const user = await step.do("fetch user", () => db.get(id));
await step.sleep("wait 7 days", "7 days");
await step.do("send reminder", () => sendEmail(user));

…and the runtime guarantees that "fetch user" runs exactly once successfully, the sleep persists across redeploys and machine failures, and "send reminder" only runs after the sleep completes — even if your Worker code is redeployed three times in the meantime.

Retries with backoff

Each step has independent retry configuration: number of attempts, backoff strategy (constant / linear / exponential), and timeout. A flaky third-party call can be configured to retry 5 times with exponential backoff while the rest of the workflow proceeds normally.

Sleep — for free

step.sleep("3 days") doesn't consume Worker compute time during the sleep. The runtime stores the wake-up time and returns the resources. When the time arrives, the workflow resumes from where it left off. This is what makes long-horizon flows (drip campaigns, trial expirations, 30-day SLAs) economically viable on serverless.

waitForEvent — pause for a webhook

A workflow can pause indefinitely with step.waitForEvent("name", { timeout }) until an external system calls the workflow's event endpoint. This is the building block for human-in-the-loop approvals, async webhooks (payment confirmations, document signing), and multi-system orchestration.

Parallelism

Inside one workflow, Promise.all([step.do(a), step.do(b), step.do(c)]) runs steps concurrently. Each is still individually retried and persisted.

When to choose Workflows vs. alternatives

Need	Use
Single async task, fire-and-forget	Queues
Recurring scheduled task (every N min/hr/day)	Cron Triggers
Multi-step, long-running, durable, possibly waiting on events	Workflows
Real-time stateful coordination per entity	Durable Objects

Customer-facing examples

User onboarding email sequences spanning days/weeks
Trial-to-paid conversion flows with mid-trial nudges
Order fulfillment: charge → reserve inventory → ship → notify → request review
Document processing pipelines (OCR → translate → embed → index)
Async report generation that can take 20 minutes
Multi-step approval workflows with human gates

User onboarding workflow

typescript

import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from "cloudflare:workers";

type Env = { ONBOARDING: Workflow; DB: D1Database; EMAIL_QUEUE: Queue };
type Params = { userId: number };

export class OnboardingWorkflow extends WorkflowEntrypoint<Env, Params> {
  async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
    const user = await step.do("fetch user", async () => {
      return await this.env.DB
        .prepare("SELECT id, email, name FROM users WHERE id = ?")
        .bind(event.payload.userId)
        .first();
    });

    await step.do("send welcome email", { retries: { limit: 5, backoff: "exponential" } },
      async () => {
        await this.env.EMAIL_QUEUE.send({ to: user.email, template: "welcome" });
      }
    );

    await step.sleep("wait 3 days", "3 days");

    await step.do("send tips email", async () => {
      await this.env.EMAIL_QUEUE.send({ to: user.email, template: "tips" });
    });

    await step.sleep("wait 7 days", "7 days");

    await step.do("send conversion offer", async () => {
      await this.env.EMAIL_QUEUE.send({ to: user.email, template: "offer" });
    });
  }
}

Triggering an instance from a Worker

typescript

app.post("/api/users", async (c) => {
  const userId = await createUser(c);
  const instance = await c.env.ONBOARDING.create({ params: { userId } });
  return c.json({ userId, workflowId: instance.id });
});

SE pitch Workflows replaces a typical "Step Functions + SQS + Lambda + DynamoDB checkpointing" stack with one primitive. The reduction in moving parts is the whole story.

11. Cron Triggers — Scheduled Workers

Schedule any Worker via cron expressions. UTC-only. Use the scheduled handler in addition to (or instead of) fetch.

What it is and isn't

A Cron Trigger is exactly what it sounds like: cron-syntax schedules attached to a Worker that fire the scheduled() handler at the specified times. Cloudflare runs the Worker globally — the trigger fires once per schedule, not once per data center.

It is not a precise scheduler. Triggers fire around the scheduled time, with at-least-once semantics (rare duplicates possible during deploys). UTC only — no timezones. If you need second-level precision or timezone awareness, do the conversion in your handler.

Why this is enough for most jobs

The vast majority of "cron jobs" customers run today are either:

Periodic data sync (every 5/15/60 min)
Nightly cleanup, archival, or backup
Health probes or external API polling
Trigger for a longer Workflow

None of these need second-precision. Cron Triggers replace what customers usually run on a dedicated EC2 instance, an ECS scheduled task, or a Kubernetes CronJob — without the operational overhead.

Combination with Workflows

The most powerful pattern is Cron → Workflow. Cron fires every hour and kicks off a workflow instance; the workflow handles the durable, multi-step work (retries, sleeps, external calls). Cron is the trigger, Workflow is the engine. This combo replaces a Step Functions / EventBridge / Lambda stack with two primitives.

Green Compute (optional)

Cloudflare lets you opt-in to "Green Compute" for crons — the runtime delays your scheduled execution to a window when the data center it runs in is drawing from low-carbon energy sources. For non-urgent batch work, this is a free sustainability story.

jsonc

{
  "triggers": {
    "crons": [
      "*/5 * * * *",      // every 5 min
      "0 2 * * *",        // 2am UTC daily
      "0 9 * * MON-FRI"   // weekdays 9am UTC
    ]
  }
}

typescript

export default {
  async scheduled(event: ScheduledController, env: Env, ctx: ExecutionContext) {
    console.log("cron fired:", event.cron, "at", new Date(event.scheduledTime));

    if (event.cron === "0 2 * * *") {
      // nightly backup
      ctx.waitUntil(exportToR2(env));
    }
    if (event.cron === "*/5 * * * *") {
      // health probe
      ctx.waitUntil(probeOrigin(env));
    }
  },
} satisfies ExportedHandler<Env>;

Test locally curl "http://localhost:8787/__scheduled?cron=*/5+*+*+*+*" after wrangler dev.

12. Workers AI + Vectorize — Inference & RAG

Workers AI runs 50+ models on Cloudflare's GPU network, called via the env.AI binding. Vectorize is the vector database for embeddings. Together they're the RAG stack — and it's the most common AI demo customers ask for.

The "AI on the network" thesis

Most AI workloads today involve a Worker (or any backend) calling out to an external LLM provider — OpenAI, Anthropic, Google. Every request crosses the public internet, costs latency, and creates a vendor relationship plus a bill plus a data-residency conversation. Cloudflare's pitch is to bring inference into the network: the same place your user's request landed and your code is already running.

Workers AI runs Cloudflare-managed open-source and commercial models on a global GPU network. From a Worker, calling a model is a binding call — env.AI.run("@cf/meta/llama-3.1-8b-instruct", ...) — not an external HTTP request. No API keys to manage, no vendor account to set up, no egress.

Workers AI — what's available

The catalog covers most of the practical task surface:

Text generation (LLMs) — Llama 3.1 family (8B, 70B), Mistral, DeepSeek-Coder, plus a rotating roster of newer open-source models. Streaming and function calling supported on flagship models.
Embeddings — BGE family in three sizes (small/base/large), multilingual variants. The standard input to RAG.
Image generation — Stable Diffusion XL, Flux, DreamShaper.
Speech-to-text — Whisper.
Translation — M2M100 (100 languages).
Classification & vision — ResNet, DETR, sentiment models.

Pricing — the "neuron" model

Workers AI is billed in neurons, an abstract unit that approximates GPU work. Different models cost different numbers of neurons per inference (a small embedding model may be 1 neuron; a 70B LLM token may be 2,000+). The free tier includes 10K neurons/day. This abstracts away the GPU type complexity that plagues other inference platforms.

Vectorize — the vector database

Vectorize stores high-dimensional vectors (embeddings) and answers nearest-neighbor queries. It's the database half of RAG: you embed your documents into vectors with an embedding model, store them in Vectorize with metadata, and at query time you embed the user's question and ask Vectorize for the top-k closest matches.

Capabilities to know:

10M vectors per index, dimensions up to 1536, three distance metrics (cosine, euclidean, dot-product).
Metadata filtering — attach JSON metadata to each vector and filter at query time (e.g. "only documents owned by tenant X").
Namespaces — strict logical isolation within an index. The standard multi-tenant pattern.
Index configuration is immutable — choose dimensions and metric carefully; you can't change them later.

RAG — what it is and why it matters

RAG (Retrieval-Augmented Generation) is the dominant pattern for "AI that knows about your data." Instead of training or fine-tuning a model on the customer's documents (expensive, slow, hard to update), you do this at request time:

The user asks a question.
Embed the question into a vector.
Search the vector DB for the top-k most similar document chunks.
Stuff those chunks into the LLM prompt as context.
The LLM answers grounded in that context, often with citations.

RAG works because LLMs are good at using information you give them, even if they weren't trained on it. The four supporting pieces — embedding model, vector DB, document store, LLM — typically come from four vendors. On Cloudflare, all four are bindings on the same Worker.

AI Gateway — the production wrapper

Customers running AI in production usually want caching (don't pay twice for the same prompt), rate limiting (protect against runaway loops), retry logic, fallbacks across providers, analytics, and a unified log for compliance. AI Gateway is the Cloudflare product that sits in front of any AI provider (OpenAI, Anthropic, Bedrock, Workers AI itself) and gives you all of that as a config-driven proxy. Worth mentioning in any "we already use OpenAI" conversation.

Agents SDK — the next layer up

For stateful AI agents (multi-turn conversations, tool use, durable memory), Cloudflare has the Agents SDK, which builds on Durable Objects. Each agent is a DO with its own conversation state, tool-calling loop, and persistent memory. We have a dedicated skill for it (agents-sdk) — bring it up when customers ask about "AI agents" or "Copilots."

Direct LLM inference

typescript

app.post("/api/chat", async (c) => {
  const { question } = await c.req.json<{ question: string }>();
  const response = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      { role: "system", content: "You are a concise technical assistant." },
      { role: "user", content: question },
    ],
  });
  return c.json(response);
});

Streaming response

typescript

app.post("/api/chat-stream", async (c) => {
  const { question } = await c.req.json<{ question: string }>();
  const stream = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [{ role: "user", content: question }],
    stream: true,
  });
  return new Response(stream, {
    headers: { "content-type": "text/event-stream" },
  });
});

Full RAG pipeline (Vectorize + Workers AI + R2)

bash

wrangler vectorize create kb --dimensions=768 --metric=cosine

typescript

// Step 1: Index a document
app.post("/api/kb/index", async (c) => {
  const { id, text } = await c.req.json<{ id: string; text: string }>();

  // Persist source in R2 for retrieval
  await c.env.FILES.put(`kb/${id}.txt`, text);

  // Embed
  const { data } = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [text] });

  // Upsert into Vectorize
  await c.env.VECTORIZE.upsert([
    { id, values: data[0], metadata: { key: `kb/${id}.txt` } },
  ]);
  return c.json({ ok: true });
});

// Step 2: Ask a question grounded in the KB
app.post("/api/kb/ask", async (c) => {
  const { question } = await c.req.json<{ question: string }>();

  const { data: qVec } = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", {
    text: [question],
  });

  const { matches } = await c.env.VECTORIZE.query(qVec[0], {
    topK: 4,
    returnMetadata: "all",
  });

  const docs = await Promise.all(
    matches.map(async (m) => {
      const obj = await c.env.FILES.get(m.metadata!.key as string);
      return obj ? await obj.text() : "";
    })
  );

  const answer = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      { role: "system", content: "Answer ONLY from the provided context. If unknown, say so." },
      { role: "user", content: `Context:\n${docs.join("\n---\n")}\n\nQuestion: ${question}` },
    ],
  });

  return c.json({ answer, citations: matches.map((m) => m.id) });
});

Why this lands with customers The entire RAG stack — embeddings, vector DB, LLM, source storage — is one Worker, one deploy, one bill, one network hop. Every other vendor needs three integrations.

13. Pages — Full-stack JAMstack

Pages is the Git-driven front door for full-stack apps. Push to a branch, get a unique preview URL. Pages Functions are file-routed Workers that share the same bindings.

Where Pages fits

Workers is great when you start with code. Pages is great when you start with a frontend framework. The product is built around three ideas that frontend teams already know how to use: a Git repo, a build command, and a public output directory. Connect the repo, set the build command (npm run build), point at the output (./dist or ./.next), and Pages handles the rest — global deploys, preview URLs per branch, automatic HTTPS, custom domains.

Static + dynamic in one product

Pages serves your static assets from Cloudflare's CDN and runs your dynamic code as Pages Functions — the same Workers runtime, exposed via file-based routing. Drop a TypeScript file at /functions/api/users/[id].ts and it's served at /api/users/123. The function has access to all the same bindings (D1, KV, R2, Workers AI, Durable Objects) as a standalone Worker.

Framework support

Cloudflare maintains adapters for the major full-stack frameworks: Next.js (via @cloudflare/next-on-pages or the newer OpenNext adapter), SvelteKit, Astro, Nuxt, Remix, Qwik, Solid Start. The C3 CLI (npm create cloudflare@latest) scaffolds any of them with bindings already wired up.

Preview URLs — the developer-experience killer feature

Every push to any branch gets a unique URL that runs the same code, with the same bindings (production or preview), as the production deploy. PRs get auto-commented with a link. Reviewers, designers, and PMs can click and test before merge. This is the feature that makes Pages stick once a team starts using it.

Workers + Static Assets — the new direction

In 2024 Cloudflare introduced Static Assets on Workers, which lets a regular Worker serve a static asset directory directly without needing Pages at all. The current guidance for new full-stack projects is to start there — you get one product surface (Workers), one config file (wrangler.jsonc), and the same Git-driven deploys. Pages remains fully supported and is still the right answer for many existing teams; check current docs when scoping.

Quick capability reference

Git integration: Connect a GitHub/GitLab repo, every push deploys.
Preview URLs per branch/PR.
Pages Functions: Drop /functions/api/hello.ts and it's served at /api/hello.
Framework support: Next.js, SvelteKit, Astro, Nuxt, Remix, Qwik, Solid.
Bindings: Same as Workers — D1, KV, R2, Durable Objects, Workers AI, Queues, Vectorize.
Build environments: Production + an unlimited number of "preview" deployments.

File-routed function

typescript — functions/api/users/[id].ts

interface Env { DB: D1Database; }

export const onRequestGet: PagesFunction<Env> = async ({ params, env }) => {
  const user = await env.DB
    .prepare("SELECT id, name, email FROM users WHERE id = ?")
    .bind(params.id)
    .first();
  return user ? Response.json(user) : new Response("Not found", { status: 404 });
};

export const onRequestDelete: PagesFunction<Env> = async ({ params, env }) => {
  await env.DB.prepare("DELETE FROM users WHERE id = ?").bind(params.id).run();
  return new Response(null, { status: 204 });
};

Note for new builds Cloudflare's current guidance is to start new full-stack projects on Workers with Static Assets rather than Pages. Pages remains fully supported, especially for Git-driven workflows. Confirm in current docs when you're scoping.

14. End-to-End Reference App

This is the demo I'd build for a SaaS prospect. It exercises Workers + Hono + D1 + KV + R2 + Queues + Workers AI in one project. Use it as a starting point.

What this demonstrates and why it lands

A single Worker, deployed with one wrangler deploy, that serves an HTTP API (Hono), authenticates users with sessions in KV, persists relational data in D1, accepts file uploads to R2, fans out background work via Queues, runs nightly cleanup via a Cron Trigger, and exposes a RAG chat endpoint via Workers AI + Vectorize. One repo, one bill, one deploy, one network hop per request.

For a customer who's currently running an EC2 fleet behind an ALB, RDS Postgres, ElastiCache Redis, S3, SQS, and Lambda for thumbnails, this is a one-page replacement of an entire architecture. The "what would we keep?" conversation is short.

Project layout

text

edge-saas/
├── wrangler.jsonc
├── migrations/
│   └── 0001_init.sql
└── src/
    ├── index.ts          (Hono app + queue handler)
    ├── routes/
    │   ├── auth.ts       (KV-backed sessions)
    │   ├── posts.ts      (D1 CRUD)
    │   ├── upload.ts     (R2 + queue trigger)
    │   └── ai.ts         (LLM + RAG endpoints)
    └── lib/
        ├── auth.ts
        └── thumbs.ts     (consumed by queue)

wrangler.jsonc

jsonc

{
  "name": "edge-saas",
  "main": "src/index.ts",
  "compatibility_date": "2026-01-01",
  "compatibility_flags": ["nodejs_compat"],
  "observability": { "enabled": true },

  "kv_namespaces": [{ "binding": "SESSIONS", "id": "..." }],
  "d1_databases": [
    { "binding": "DB", "database_name": "edge-saas", "database_id": "..." }
  ],
  "r2_buckets": [{ "binding": "FILES", "bucket_name": "edge-saas-uploads" }],
  "queues": {
    "producers": [{ "binding": "THUMBS", "queue": "thumbs" }],
    "consumers": [{ "queue": "thumbs", "max_batch_size": 10, "max_retries": 3 }]
  },
  "ai": { "binding": "AI" },
  "vectorize": [{ "binding": "VECTORIZE", "index_name": "kb" }]
}

src/index.ts

typescript

import { Hono } from "hono";
import { cors } from "hono/cors";
import auth from "./routes/auth";
import posts from "./routes/posts";
import upload from "./routes/upload";
import ai from "./routes/ai";
import { generateThumbnail } from "./lib/thumbs";

type Bindings = {
  DB: D1Database;
  SESSIONS: KVNamespace;
  FILES: R2Bucket;
  THUMBS: Queue<{ key: string }>;
  AI: Ai;
  VECTORIZE: VectorizeIndex;
};

const app = new Hono<{ Bindings: Bindings }>();
app.use("*", cors());
app.route("/auth", auth);
app.route("/api/posts", posts);
app.route("/api/upload", upload);
app.route("/api/ai", ai);

export default {
  fetch: app.fetch,
  async queue(batch: MessageBatch<{ key: string }>, env: Bindings) {
    for (const msg of batch.messages) {
      try {
        await generateThumbnail(env, msg.body.key);
        msg.ack();
      } catch (e) {
        msg.retry({ delaySeconds: 30 });
      }
    }
  },
  async scheduled(event: ScheduledController, env: Bindings, ctx: ExecutionContext) {
    if (event.cron === "0 3 * * *") {
      ctx.waitUntil(env.DB.prepare("DELETE FROM sessions WHERE expires_at < ?")
        .bind(Date.now()).run());
    }
  },
} satisfies ExportedHandler<Bindings>;

src/routes/auth.ts — KV-backed sessions

typescript

import { Hono } from "hono";
import { setCookie, getCookie } from "hono/cookie";

const auth = new Hono<{ Bindings: Bindings }>();

auth.post("/login", async (c) => {
  const { email, password } = await c.req.json();
  const user = await c.env.DB.prepare(
    "SELECT id, email, password_hash FROM users WHERE email = ?"
  ).bind(email).first<{ id: number; email: string; password_hash: string }>();

  if (!user || !(await verify(password, user.password_hash))) {
    return c.json({ error: "invalid credentials" }, 401);
  }

  const sid = crypto.randomUUID();
  await c.env.SESSIONS.put(sid, JSON.stringify({ userId: user.id }), {
    expirationTtl: 60 * 60 * 24 * 7, // 7 days
  });
  setCookie(c, "sid", sid, { httpOnly: true, secure: true, sameSite: "Lax", maxAge: 604800 });
  return c.json({ ok: true });
});

auth.post("/logout", async (c) => {
  const sid = getCookie(c, "sid");
  if (sid) await c.env.SESSIONS.delete(sid);
  return c.json({ ok: true });
});

export default auth;

src/routes/upload.ts — R2 + Queue fan-out

typescript

import { Hono } from "hono";

const upload = new Hono<{ Bindings: Bindings }>();

upload.post("/", async (c) => {
  const form = await c.req.formData();
  const file = form.get("file") as File | null;
  if (!file) return c.json({ error: "no file" }, 400);

  const key = `${crypto.randomUUID()}/${file.name}`;
  await c.env.FILES.put(key, file.stream(), {
    httpMetadata: { contentType: file.type },
  });

  // Fire-and-forget thumbnail generation
  await c.env.THUMBS.send({ key });

  return c.json({ key });
});

export default upload;

15. Customer Patterns & Talk Tracks

Map a customer's words to the right pieces of the platform. These are the conversations you'll have most often.

Customer says…	Reach for	Why
"We need an API gateway / BFF"	Workers + Hono	Edge-fast, no infra, easy auth/rate-limiting
"We're moving off S3 to cut egress"	R2 + Super Slurper	Zero egress, S3-compatible API, native to Workers
"We have multi-tenant rate limits per customer"	Durable Objects	One DO per tenant, atomic counters
"We're building a chatbot over our docs"	Workers AI + Vectorize + R2	Full RAG in one Worker
"We need real-time presence/collab"	Durable Objects + WebSocket Hibernation	Stateful per-room, zero-cost idle
"Multi-step onboarding emails"	Workflows + Queues + Email Workers	Durable steps, retries, sleeps for days/weeks
"Postgres exists, can't migrate"	Hyperdrive	Pool + cache against existing DB
"Nightly ETL / report generation"	Cron Triggers + Workflows + R2	One platform, no orchestrator needed
"Per-user feature flags + rollouts"	KV + Workers + Flagship	<10ms reads at the edge
"We need preview URLs for every PR"	Pages (or Workers + Static Assets)	Git-driven, automatic per-branch
"Untrusted user code in our app"	Sandbox SDK / Workers for Platforms	Isolated execution per tenant

Talk tracks

The "no region" opener

"When you build on AWS, the first decision is which region. With Workers, that decision doesn't exist — your code is in 300+ cities the moment you deploy. The cost of that decision is what we remove."

The "egress" opener (for S3 / object storage)

"What's your monthly egress bill on object storage? R2 charges zero on egress. Most customers see ROI in 3–6 months even before factoring in performance."

The "stateful at the edge" pivot

"Most edge platforms can run code at the edge but make you call back to a region for state. Durable Objects co-locate your code AND your state on the same machine — single-threaded, strongly consistent, sub-millisecond."

The "RAG in one deploy" pitch

"Your AI roadmap probably has four vendors today: an LLM provider, a vector DB, an object store, and an API runtime. With us, those are all bindings on the same Worker. One bill, one deploy, one network hop."

16. Cheat Sheet & Discovery Questions

Limits to memorize

Resource	Limit (paid)
Worker CPU per request	30s default, configurable up to 5min
Worker request size	~100 MB
KV value	25 MiB
KV writes per key	1/sec
D1 database	10 GB per database
D1 row size	1 MB
R2 object	5 TB (multipart)
Durable Object throughput	~1K req/s per instance
Queue message	128 KB
Queue throughput	5K msg/sec per queue
Vectorize	10M vectors per index, 1536 dims max

Always verify Limits change. Confirm in developers.cloudflare.com before quoting numbers in a customer-facing doc or RFP.

Discovery questions to ask

Where do your end users live? (latency justification)
What's your monthly egress bill? (R2 hook)
How many regions do you currently deploy to? (operational tax)
Is any part of your stack stateful or real-time? (DO hook)
What's your AI roadmap? Are you running RAG today? (Workers AI + Vectorize)
How do you handle long-running jobs? (Workflows hook)
What's your CI/CD for preview environments? (Pages hook)
Do you have multi-tenant isolation requirements? (Workers for Platforms / Sandbox SDK)
What database are you on, and is it a constraint? (D1 / Hyperdrive)
Where do you store user-generated content today? (R2 + S3 migration)

Demo "starter kit" — what to keep in your back pocket

A Hono + D1 + KV CRUD app (15-min build).
A Durable Object chat room (10-min build, very visual).
A Vectorize + Workers AI RAG over a small PDF corpus (~30 min, lands every AI conversation).
An R2 upload + queue + thumbnail-generation flow (shows event-driven).
A Workflows onboarding sequence with sleeps (shows durability).

Where to go deeper

Cloudflare Developer Platform — Study Guide for Solutions Engineers · Built locally · Sourced from official Cloudflare documentation referenced via the cloudflare OpenCode skill.