Fundamentals, architectural patterns, runnable code, and customer talk tracks for the SaaS, ISV, and "build on the edge" conversation.
Most cloud platforms ship a region. Cloudflare ships a network. The Developer Platform is the set of primitives that lets customers run code, store data, and do AI inference inside that network — no region selection, no idle servers, no egress fees.
wrangler deploy publishes to 300+ cities at once. There is no "region."env. No connection strings, no IAM gymnastics.fetch, Request, Response, URL, Headers, WebSocket, Streams — code is portable.Group the platform into three families. Customers always combine pieces from each.
Workers, Pages, Durable Objects, Workflows, Queues, Cron Triggers, Containers.
KV, D1, R2, Hyperdrive, DO Storage (SQLite), Queues, Vectorize, Secrets Store.
Workers AI (50+ models), Vectorize, AI Gateway, AI Search, Agents SDK.
| Need | Pick | Why |
|---|---|---|
| High-read config / sessions / feature flags | KV | <10ms global reads, eventually consistent |
| Relational data, joins, transactions | D1 | SQLite per database, 10 GB, Time Travel recovery |
| Files, video, backups, S3 migration | R2 | S3 API, zero egress, strong consistency |
| Strong consistency on a single entity (room, user, document) | Durable Objects | Single-threaded, co-located storage, ~1K req/s |
| Async jobs / event fan-out | Queues | At-least-once delivery, batching, DLQ |
| Vector embeddings (RAG / search) | Vectorize | 10M vectors/index, cosine/euclidean/dot |
| Existing Postgres / MySQL | Hyperdrive | Connection pooling + caching to your DB |
| Need | Pick |
|---|---|
| HTTP API / edge logic / auth | Workers |
| Full-stack web app with Git deploys | Pages (or Workers Static Assets) |
| Real-time coordination (chat, presence, locks) | Durable Objects |
| Multi-step durable jobs (minutes → weeks) | Workflows |
| Async fan-out | Queues |
| Scheduled tasks | Cron Triggers |
| Long-running stateful processes / heavy deps | Containers |
A Worker is a JS/TS/Rust/Python module deployed globally that exports handlers. The most common is fetch — invoked on every HTTP request to your route.
A Worker is a piece of code that runs inside Cloudflare's edge network instead of on a server you rent or operate. When a user hits your domain, the request lands at the nearest Cloudflare data center (one of 300+ globally), and a Worker is invoked in that data center. There is no region, no autoscaling group, no load balancer to configure — the network is the runtime.
Under the hood, Workers run on V8 isolates, the same lightweight sandbox technology that powers each browser tab in Chrome. An isolate is not a container, not a VM, not a process — it's a small slice of a long-running V8 instance. Cloudflare can boot thousands of them per second on a single machine, which is why cold starts are measured in microseconds rather than seconds, and why the platform doesn't bill you for idle time.
Every Worker request executes in three distinct phases:
fetch(request, env, ctx) method is invoked. request is a standard Request object, env contains your bindings (more on this below), and ctx is the execution context.Response goes back to the user. Anything passed to ctx.waitUntil() keeps running after the response — perfect for logging, cache warming, queue sends, or analytics.A binding is a typed reference to another Cloudflare resource (a KV namespace, a D1 database, an R2 bucket, an AI model, another Worker) that appears as a property on env. There are no connection strings, no API keys to manage, no SDKs to import — the platform wires it up at deploy time.
This matters because most cloud security incidents come from leaked credentials or misconfigured IAM. With bindings, the Worker only has access to what you explicitly grant in wrangler.jsonc, and that grant is the credential. Customers used to AWS will recognize this as "IAM done right."
Workers implement the Web Platform APIs — fetch, Request, Response, Headers, URL, WebSocket, Streams, crypto.subtle, TextEncoder. The same APIs you'd use in a browser. This is deliberate: code written for Workers is portable, and developers don't have to learn a proprietary runtime.
Node.js built-ins (fs, net, buffer, etc.) are not available by default, because Workers don't have a filesystem or persistent process. If a customer needs them (often for npm package compatibility), enable the nodejs_compat compatibility flag — it adds a polyfilled subset.
JavaScript and TypeScript are first-class. Python, Rust, and any language compilable to WebAssembly (Go, C, C++, .NET via Blazor) are also supported. Most production Workers are TypeScript because the binding type system is best-in-class there.
Workers are billed on requests + CPU time, not wall-clock time. A Worker that's awaiting a slow database call is not billing you for those milliseconds — only the time V8 actually spent executing your code. This is a fundamentally different cost model from Lambda or Cloud Run, and it tends to be 5–20x cheaper for typical edge workloads.
request, env (bindings), ctx (execution context).ctx.waitUntil(promise) to keep work alive after the response is sent (logging, cache writes, queue sends).nodejs_compat flag if you need them.scheduled (cron), queue (queue consumer), tail (log stream), email (Email Workers).export interface Env {
MY_KV: KVNamespace;
DB: D1Database;
AI: Ai;
}
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const url = new URL(request.url);
if (url.pathname === "/health") {
return Response.json({ ok: true, ts: Date.now() });
}
return new Response("Hello from the edge", { status: 200 });
},
} satisfies ExportedHandler<Env>;
{
"$schema": "node_modules/wrangler/config-schema.json",
"name": "edge-api",
"main": "src/index.ts",
"compatibility_date": "2026-01-01",
"compatibility_flags": ["nodejs_compat"],
"observability": { "enabled": true },
"kv_namespaces": [
{ "binding": "MY_KV", "id": "abc123..." }
],
"d1_databases": [
{ "binding": "DB", "database_name": "app", "database_id": "..." }
],
"r2_buckets": [
{ "binding": "FILES", "bucket_name": "uploads" }
],
"ai": { "binding": "AI" }
}
npm create cloudflare@latest my-worker -- --type hello-world
cd my-worker
npx wrangler dev # local dev
npx wrangler dev --remote # use real bindings
npx wrangler deploy # publish globally
npx wrangler tail # stream live logs
npx wrangler secret put STRIPE_KEY # set encrypted secret
Hono is the de facto Workers-native web framework. It's tiny, type-safe, and feels like Express. Use it any time you have more than two routes.
The raw Workers API gives you a single fetch handler that receives a Request. For any non-trivial API, you'll quickly want routing, middleware, body parsing, validation, error handling, and response helpers. You can write all of that yourself, but every team ends up rebuilding the same primitives.
Hono is the framework Cloudflare itself recommends. It was built for edge runtimes from day one (not retrofitted from Node), so it has zero dependencies, sub-millisecond router latency, and a tiny bundle size that doesn't eat your Worker's startup budget.
c.env with full TypeScript inference.Almost always. The exceptions: a tiny single-purpose Worker (one route, no body parsing) or a pure proxy. For any SaaS backend, BFF, or public API, Hono pays for itself within an hour.
itty-router — even smaller, less ergonomic, fine for very small APIs. Hattip — adapter layer that lets you target multiple edge runtimes. For full-stack apps with frontend frameworks, you'd usually use the framework's own router (Next.js, SvelteKit, Astro, Remix) and reserve Hono for standalone APIs.
import { Hono } from "hono";
import { cors } from "hono/cors";
import { logger } from "hono/logger";
import { zValidator } from "@hono/zod-validator";
import { z } from "zod";
type Bindings = {
DB: D1Database;
MY_KV: KVNamespace;
};
const app = new Hono<{ Bindings: Bindings }>();
app.use("*", logger());
app.use("/api/*", cors({ origin: "*" }));
// Auth middleware
app.use("/api/admin/*", async (c, next) => {
const auth = c.req.header("Authorization");
if (auth !== `Bearer ${c.env.MY_KV.get("admin-token")}`) {
return c.json({ error: "unauthorized" }, 401);
}
await next();
});
// Validated POST
const createUser = z.object({
email: z.string().email(),
name: z.string().min(1),
});
app.post("/api/users", zValidator("json", createUser), async (c) => {
const { email, name } = c.req.valid("json");
const result = await c.env.DB
.prepare("INSERT INTO users (email, name) VALUES (?, ?) RETURNING id")
.bind(email, name)
.first<{ id: number }>();
return c.json({ id: result?.id, email, name }, 201);
});
app.onError((err, c) => {
console.error(err);
return c.json({ error: err.message }, 500);
});
export default app;
Globally distributed, eventually consistent KV store. Read-optimized — values cached at the edge after first read. Best for config, sessions, feature flags, and read-heavy caches.
KV is best understood as a read-through cache layered on top of a globally replicated store. The authoritative copy of every key lives in a small number of central data centers. The first time a Worker reads a key in a given PoP, KV fetches it from the central store and caches it in that PoP's local memory. Subsequent reads from the same PoP are essentially free — sub-10ms, often sub-millisecond.
This architecture is why KV is so fast for reads and why writes take time to propagate: a write must invalidate or refresh that cached copy in every PoP that's seen the key. Up to 60 seconds globally is normal.
KV is eventually consistent. Concretely:
This is the most common KV mistake: customers reach for it because it's fast, and then try to use it as a primary store for things like counters, rate limits, or transactional state. Those workloads need Durable Objects or D1. KV is for data that's read 1000x more often than it's written.
Every KV value can carry up to 1024 bytes of metadata that's returned alongside the value in a single round trip. This is perfect for storing small extras (TTL hints, version numbers, content type) without an extra round trip. list() returns metadata too, so you can paginate over a prefix and read attributes without fetching every value.
wrangler kv namespace create CONFIG
# Add the returned id to wrangler.jsonc under kv_namespaces
// Write with a 5-minute TTL
await env.CONFIG.put("flag:dark-mode", JSON.stringify({ enabled: true }), {
expirationTtl: 300,
});
// Read as JSON
const flag = await env.CONFIG.get<{ enabled: boolean }>("flag:dark-mode", "json");
// List with a prefix
const { keys } = await env.CONFIG.list({ prefix: "flag:" });
// Read with metadata (avoid double round-trip)
const { value, metadata } = await env.CONFIG.getWithMetadata<Session, SessionMeta>(
`session:${sid}`,
"json"
);
app.get("/api/flags/:key", async (c) => {
const key = c.req.param("key");
// cacheTtl avoids a KV roundtrip on every request in this PoP
const value = await c.env.CONFIG.get(`flag:${key}`, {
type: "json",
cacheTtl: 60,
});
return c.json({ key, value });
});
SQLite-compatible serverless database. Best for relational application data: users, accounts, posts, audit logs. 10 GB per database, 30-day Time Travel point-in-time recovery, optional read replicas.
Most cloud SQL offerings (RDS Postgres, Aurora, Cloud SQL) optimize for one big database serving many applications. D1 inverts that: it's optimized for many small databases, one per tenant or per application. The underlying engine is SQLite — the most-deployed database in the world, embedded in every iPhone, Android device, and browser. It's battle-tested, has world-class SQL semantics, and runs in-process (no network round trip between query and engine).
Cloudflare took SQLite, wrapped it in a managed service with replication, durability, point-in-time recovery, and edge access, and exposed it as a Worker binding. The result is a database that feels like Postgres for the developer but scales horizontally by sharding databases (per-tenant, per-region) instead of vertically.
One D1 database is capped at 10 GB and ~1K writes/sec. That sounds limiting until you realize the design intent: you're supposed to run many of them. A SaaS platform with 10,000 tenants might run 10,000 D1 databases — one per tenant — each with its own isolated schema, performance, and data residency. Cloudflare bills you on rows read/written and storage, not per database.
This pattern (one DB per tenant) is genuinely hard on traditional cloud SQL because of connection limits, provisioning overhead, and cost per instance. On D1 it's the happy path.
Every D1 database has a 30-day continuous backup. You can restore to any second within that window with wrangler d1 time-travel restore. There's no setup, no extra cost, no separate snapshot service. For customers used to managing RDS automated snapshots and PITR windows, this alone is often the moment they get sold.
Writes go to a single primary; D1 can place read replicas in regions you specify. Workers automatically read from the nearest replica with automatic failover to the primary if needed. The Sessions API ensures read-your-writes consistency within a session — critical for "user updates profile, immediately reloads page" flows.
D1 supports the full SQLite SQL dialect: CTEs, window functions, JSON functions, full-text search via FTS5, generated columns, partial indexes. It does not support stored procedures, triggers with side effects outside the DB, or Postgres-specific syntax. ORMs that work with SQLite (Drizzle, Kysely, Prisma) work with D1.
wrangler d1 create app
wrangler d1 migrations create app init_schema
# edits ./migrations/0001_init_schema.sql
wrangler d1 migrations apply app --remote
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
created_at INTEGER NOT NULL DEFAULT (unixepoch())
);
CREATE INDEX idx_users_email ON users(email);
CREATE TABLE posts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL REFERENCES users(id),
title TEXT NOT NULL,
body TEXT NOT NULL,
created_at INTEGER NOT NULL DEFAULT (unixepoch())
);
// .first() — single row
const user = await env.DB
.prepare("SELECT id, email, name FROM users WHERE email = ?")
.bind(email)
.first<{ id: number; email: string; name: string }>();
// .all() — many rows
const { results } = await env.DB
.prepare("SELECT * FROM posts WHERE user_id = ? ORDER BY created_at DESC LIMIT 20")
.bind(userId)
.all<Post>();
// .run() — INSERT/UPDATE/DELETE
const { meta } = await env.DB
.prepare("INSERT INTO posts (user_id, title, body) VALUES (?, ?, ?)")
.bind(userId, title, body)
.run();
console.log("inserted id:", meta.last_row_id);
// Batch — atomic transaction in one round trip
await env.DB.batch([
env.DB.prepare("UPDATE users SET name = ? WHERE id = ?").bind(name, id),
env.DB.prepare("INSERT INTO audit_log (user_id, action) VALUES (?, ?)").bind(id, "rename"),
]);
.bind(). The platform supports prepared statements for a reason.
S3-compatible object storage with zero egress fees. Strong consistency on writes and deletes. Use cases: user uploads, media libraries, backups, data lakes, static assets, S3 migration targets.
Cloud object storage has historically had a brutal economic asymmetry: storing a TB is cheap (~$23/mo on S3 Standard), but reading it out to the internet costs $0.05–$0.09 per GB. For media-heavy businesses (video, images, downloads, ML training data, backups), egress can dwarf storage costs by 10x or more.
R2's pricing model removes egress fees entirely. You pay for storage and for operations (Class A — writes/lists, Class B — reads), but bytes leaving R2 to the internet, to your origin, or to a customer's browser are free. This is enabled by Cloudflare's network: outbound bandwidth is already paid for by the broader CDN business, so R2 doesn't need to recoup it.
R2 implements the S3 REST API. Most S3 SDKs and tools (AWS SDK, boto3, s3cmd, rclone, Terraform) work by changing the endpoint URL and credentials. The migration story for an S3 customer is therefore very gentle: point your existing code at R2, optionally use Cloudflare's Super Slurper tool to bulk-copy your existing buckets, and start saving on egress immediately.
For new code on Cloudflare, you'd typically use the native Workers binding (env.MY_BUCKET.put/get/list) instead of the S3 SDK — it's faster, has no auth overhead, and offers some R2-specific features (multipart with smaller minimum, conditional requests via ETags).
R2 is strongly consistent for reads, writes, and deletes. After put() resolves, every subsequent get() sees the new value. After delete(), every get() returns 404. This is a meaningful step up from S3, which has had read-after-write strong consistency since 2020 but historically dealt with various caveats.
You can set lifecycle rules to automatically transition objects between classes (e.g. "move to IA after 60 days").
R2 can emit events to a Cloudflare Queue when objects are created or deleted. This turns R2 into the trigger for an event-driven pipeline — exactly the same pattern as S3 + SQS + Lambda, but native to one platform. Common uses: thumbnail/transcoding jobs on upload, virus scanning, indexing into Vectorize, audit logging.
You can expose an R2 bucket directly via a public URL or attach it to a custom domain. Combined with Cloudflare's CDN cache (free), this turns R2 into a high-performance static asset host with no egress, no separate CDN configuration, and full Cache Rules / Workers Transform Rules in front.
wrangler r2 bucket create uploads --location=enam
// Upload from a multipart form
app.post("/api/upload", async (c) => {
const form = await c.req.formData();
const file = form.get("file") as File | null;
if (!file) return c.json({ error: "no file" }, 400);
const key = `${crypto.randomUUID()}/${file.name}`;
await c.env.FILES.put(key, file.stream(), {
httpMetadata: { contentType: file.type },
customMetadata: { uploadedBy: c.get("userId") },
});
return c.json({ key, size: file.size });
});
// Stream download
app.get("/api/files/:key{.+}", async (c) => {
const obj = await c.env.FILES.get(c.req.param("key"));
if (!obj) return c.notFound();
const headers = new Headers();
obj.writeHttpMetadata(headers);
headers.set("etag", obj.httpEtag);
return new Response(obj.body, { headers });
});
{
"r2_buckets": [{ "binding": "FILES", "bucket_name": "uploads" }],
"queues": {
"producers": [{ "binding": "PROCESS_QUEUE", "queue": "process-uploads" }]
}
}
// Then: wrangler r2 bucket notification create uploads \
// --queue process-uploads --event-type object-create
A Durable Object is a globally-unique, single-threaded, stateful actor with co-located storage. Use them when many clients need to coordinate around one thing: a chatroom, a document, a user's session, a rate-limit counter.
Stateless serverless functions are great for stateless work, but most real apps have shared state — a chat room every participant writes to, a document several users edit at once, a counter that increments atomically, a rate-limit window per API key. Traditional serverless punts this state to an external database, which means every interaction becomes a round trip to a region, and you fight race conditions with optimistic locking, transactions, or distributed locks.
Durable Objects collapse the compute and the state into one entity. For a given ID, there is exactly one DO instance running anywhere in the world at any moment. All requests for that ID are routed to that instance and processed in serial order. There are no race conditions because there's no concurrency. The state lives in memory and in co-located storage on the same machine.
Conceptually, a DO is an actor — a tiny stateful service identified by name, addressable from anywhere, that processes one message at a time. This is the same model as Erlang processes, Akka actors, or Orleans grains, but as a managed serverless primitive on Cloudflare's network.
You spawn a DO by name (idFromName("room:lobby")) or by a unique ID (newUniqueId()). The first time anyone references that ID, Cloudflare instantiates the object near where the first request originated. From then on, every request for that ID lands on that same instance — providing strong consistency for free.
Each modern DO comes with its own embedded SQLite database, accessible via synchronous APIs (no await needed for reads — they're literal microseconds). You can also use the simpler key-value API on top of the same SQLite engine. Storage is up to 10 GB per DO, durable, and replicated. There's also a 30-day point-in-time recovery option.
Because storage is co-located, a DO can read and write its own state in microseconds — no network hop. This is what makes them fast enough for real-time use cases that would be impossible with an external database.
DOs are the only Cloudflare primitive that can hold open WebSocket connections. The Hibernation API lets a DO accept thousands of WebSockets and then go idle — Cloudflare evicts the in-memory state, but the connections stay open and "wake" the DO when a message arrives. You're billed for storage and active compute, not for sitting around waiting. This makes DOs uniquely suited for chat, multiplayer, collaborative editing, and IoT fan-in.
A DO can schedule itself to wake up at a future time via setAlarm(). The alarm is durable (survives evictions, restarts, hibernation). This gives you per-entity scheduling: "remind this user in 24 hours," "expire this session at 5pm UTC," "retry this charge in 1 hour." There's a single alarm slot per DO; for multiple events use a queue pattern internally.
The serial execution that makes DOs strongly consistent also caps a single instance at roughly 1,000 requests/second. For higher throughput, you shard: instead of one global counter, run 100 counter shards (idFromName("counter:" + Math.floor(Math.random() * 100))) and aggregate. This is a deliberate design trade — strong consistency per shard, eventual consistency across shards.
The mental test: "Is there a single thing that many clients touch at once, and does the order of those touches matter?" If yes, DO. Examples:
import { DurableObject } from "cloudflare:workers";
export class Counter extends DurableObject<Env> {
async increment(): Promise<number> {
const row = this.ctx.storage.sql.exec(
`INSERT INTO counters (id, value) VALUES (1, 1)
ON CONFLICT(id) DO UPDATE SET value = value + 1
RETURNING value`
).one<{ value: number }>();
return row.value;
}
}
export default {
async fetch(req: Request, env: Env): Promise<Response> {
const id = env.COUNTER.idFromName("global");
const stub = env.COUNTER.get(id);
const count = await stub.increment(); // RPC — typed, no fetch needed
return Response.json({ count });
},
} satisfies ExportedHandler<Env>;
{
"durable_objects": {
"bindings": [{ "name": "COUNTER", "class_name": "Counter" }]
},
"migrations": [
{ "tag": "v1", "new_sqlite_classes": ["Counter"] }
]
}
export class ChatRoom extends DurableObject<Env> {
async fetch(request: Request): Promise<Response> {
if (request.headers.get("Upgrade") !== "websocket") {
return new Response("expected ws", { status: 426 });
}
const pair = new WebSocketPair();
const [client, server] = Object.values(pair);
// Hibernation API — zero compute cost while idle
this.ctx.acceptWebSocket(server);
return new Response(null, { status: 101, webSocket: client });
}
webSocketMessage(ws: WebSocket, msg: string) {
// Broadcast to all connected clients
for (const peer of this.ctx.getWebSockets()) {
peer.send(msg);
}
}
webSocketClose(ws: WebSocket, code: number, reason: string) {
ws.close(code, reason);
}
}
Push messages from a Worker, consume them in batches in another Worker. At-least-once delivery, configurable retries, dead-letter queues, delays up to 12 hours.
Some work shouldn't happen on the request path: sending email, generating thumbnails, running ML inference, calling a slow third-party API, fanning out a webhook to 100 subscribers. Doing it inline blocks the user, magnifies failure surface, and ties the work's lifetime to the request's lifetime.
A queue decouples the producer (your API endpoint, which just enqueues the work) from the consumer (a separate Worker that processes it later). The producer returns 202 immediately; the consumer takes its time, retries on failure, and dead-letters anything it can't handle.
Cloudflare Queues guarantees that every message is delivered at least once to a consumer. In rare failure scenarios it may be delivered more than once. This means your consumer logic must be idempotent — processing the same message twice should produce the same result as processing it once. The standard pattern is to use a unique message ID (or a hash of the payload) and check against a "seen" record in D1 or KV before doing the side effect.
Consumers receive messages in batches, not one at a time. You configure two knobs: max_batch_size (e.g. 25 messages) and max_batch_timeout (e.g. 5 seconds). The consumer fires when either limit is hit. Batching dramatically improves throughput for downstream operations — one D1 batch insert beats 25 individual ones, one external API call with 25 items beats 25 single calls.
If a message fails (you call msg.retry() or your handler throws), Queues will redeliver it after a configurable delay, up to max_retries times. After that, the message goes to a dead-letter queue (DLQ) — another queue you've designated to capture poison pills for inspection. DLQs are critical for production: without one, a malformed message can retry forever, burning your CPU budget and clogging the queue.
queue() handler with each batch. The default and most common pattern.You can publish a message with a delay of up to 12 hours, or retry with a delay. This turns Queues into a lightweight scheduler for short-horizon work ("retry this in 5 minutes," "send this notification in 2 hours"). For longer-horizon or multi-step work, use Workflows.
wrangler queues create email-jobs
wrangler queues create email-jobs-dlq
{
"queues": {
"producers": [
{ "binding": "EMAIL_QUEUE", "queue": "email-jobs" }
],
"consumers": [
{
"queue": "email-jobs",
"max_batch_size": 25,
"max_batch_timeout": 5,
"max_retries": 3,
"dead_letter_queue": "email-jobs-dlq"
}
]
}
}
type EmailJob = { to: string; template: string; vars: Record<string, string> };
// Producer (called from your API)
app.post("/api/signup", async (c) => {
const { email } = await c.req.json();
// ... create user in D1 ...
await c.env.EMAIL_QUEUE.send({
to: email,
template: "welcome",
vars: { name: email.split("@")[0] },
} satisfies EmailJob);
return c.json({ ok: true }, 202);
});
// Consumer
export default {
fetch: app.fetch,
async queue(batch: MessageBatch<EmailJob>, env: Env): Promise<void> {
for (const msg of batch.messages) {
try {
await sendEmail(env, msg.body);
msg.ack();
} catch (err) {
// Retry with backoff; after max_retries → DLQ
msg.retry({ delaySeconds: Math.min(60 * msg.attempts, 600) });
}
}
},
} satisfies ExportedHandler<Env, EmailJob>;
queue() retries the entire batch, not just the failed message. Always use per-message try/catch and explicitly ack() or retry().
When a job has multiple steps that can fail independently, takes longer than a single request, or needs to wait for an external event — Workflows. Steps are individually retried; successful steps are not re-run.
Some business processes don't fit in a single request or a single queue message. They're multi-step (call the payment processor → wait for confirmation → write to DB → send receipt → schedule reminder), long-running (3-day onboarding email sequence, 30-day trial expiration), or conditional on external events (wait for the user to confirm their email, wait for a webhook from Stripe).
Building this on raw queues and crons gets ugly fast. You end up reinventing checkpointing, retry state, idempotency keys, and timeouts in a database. AWS calls this category "Step Functions"; Temporal and Inngest exist as standalone vendors. Cloudflare Workflows is the same primitive, baked into the platform.
A Workflow is a class with a run() method that calls step.do() for each unit of work. Each step's return value is persisted. If the Worker crashes, the machine is rebooted, or a step fails and retries, only the failed step re-runs — the successful ones replay from the persisted log.
This means you can write what looks like ordinary sequential code:
const user = await step.do("fetch user", () => db.get(id));
await step.sleep("wait 7 days", "7 days");
await step.do("send reminder", () => sendEmail(user));
…and the runtime guarantees that "fetch user" runs exactly once successfully, the sleep persists across redeploys and machine failures, and "send reminder" only runs after the sleep completes — even if your Worker code is redeployed three times in the meantime.
Each step has independent retry configuration: number of attempts, backoff strategy (constant / linear / exponential), and timeout. A flaky third-party call can be configured to retry 5 times with exponential backoff while the rest of the workflow proceeds normally.
step.sleep("3 days") doesn't consume Worker compute time during the sleep. The runtime stores the wake-up time and returns the resources. When the time arrives, the workflow resumes from where it left off. This is what makes long-horizon flows (drip campaigns, trial expirations, 30-day SLAs) economically viable on serverless.
A workflow can pause indefinitely with step.waitForEvent("name", { timeout }) until an external system calls the workflow's event endpoint. This is the building block for human-in-the-loop approvals, async webhooks (payment confirmations, document signing), and multi-system orchestration.
Inside one workflow, Promise.all([step.do(a), step.do(b), step.do(c)]) runs steps concurrently. Each is still individually retried and persisted.
| Need | Use |
|---|---|
| Single async task, fire-and-forget | Queues |
| Recurring scheduled task (every N min/hr/day) | Cron Triggers |
| Multi-step, long-running, durable, possibly waiting on events | Workflows |
| Real-time stateful coordination per entity | Durable Objects |
import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from "cloudflare:workers";
type Env = { ONBOARDING: Workflow; DB: D1Database; EMAIL_QUEUE: Queue };
type Params = { userId: number };
export class OnboardingWorkflow extends WorkflowEntrypoint<Env, Params> {
async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
const user = await step.do("fetch user", async () => {
return await this.env.DB
.prepare("SELECT id, email, name FROM users WHERE id = ?")
.bind(event.payload.userId)
.first();
});
await step.do("send welcome email", { retries: { limit: 5, backoff: "exponential" } },
async () => {
await this.env.EMAIL_QUEUE.send({ to: user.email, template: "welcome" });
}
);
await step.sleep("wait 3 days", "3 days");
await step.do("send tips email", async () => {
await this.env.EMAIL_QUEUE.send({ to: user.email, template: "tips" });
});
await step.sleep("wait 7 days", "7 days");
await step.do("send conversion offer", async () => {
await this.env.EMAIL_QUEUE.send({ to: user.email, template: "offer" });
});
}
}
app.post("/api/users", async (c) => {
const userId = await createUser(c);
const instance = await c.env.ONBOARDING.create({ params: { userId } });
return c.json({ userId, workflowId: instance.id });
});
Schedule any Worker via cron expressions. UTC-only. Use the scheduled handler in addition to (or instead of) fetch.
A Cron Trigger is exactly what it sounds like: cron-syntax schedules attached to a Worker that fire the scheduled() handler at the specified times. Cloudflare runs the Worker globally — the trigger fires once per schedule, not once per data center.
It is not a precise scheduler. Triggers fire around the scheduled time, with at-least-once semantics (rare duplicates possible during deploys). UTC only — no timezones. If you need second-level precision or timezone awareness, do the conversion in your handler.
The vast majority of "cron jobs" customers run today are either:
None of these need second-precision. Cron Triggers replace what customers usually run on a dedicated EC2 instance, an ECS scheduled task, or a Kubernetes CronJob — without the operational overhead.
The most powerful pattern is Cron → Workflow. Cron fires every hour and kicks off a workflow instance; the workflow handles the durable, multi-step work (retries, sleeps, external calls). Cron is the trigger, Workflow is the engine. This combo replaces a Step Functions / EventBridge / Lambda stack with two primitives.
Cloudflare lets you opt-in to "Green Compute" for crons — the runtime delays your scheduled execution to a window when the data center it runs in is drawing from low-carbon energy sources. For non-urgent batch work, this is a free sustainability story.
{
"triggers": {
"crons": [
"*/5 * * * *", // every 5 min
"0 2 * * *", // 2am UTC daily
"0 9 * * MON-FRI" // weekdays 9am UTC
]
}
}
export default {
async scheduled(event: ScheduledController, env: Env, ctx: ExecutionContext) {
console.log("cron fired:", event.cron, "at", new Date(event.scheduledTime));
if (event.cron === "0 2 * * *") {
// nightly backup
ctx.waitUntil(exportToR2(env));
}
if (event.cron === "*/5 * * * *") {
// health probe
ctx.waitUntil(probeOrigin(env));
}
},
} satisfies ExportedHandler<Env>;
curl "http://localhost:8787/__scheduled?cron=*/5+*+*+*+*" after wrangler dev.
Workers AI runs 50+ models on Cloudflare's GPU network, called via the env.AI binding. Vectorize is the vector database for embeddings. Together they're the RAG stack — and it's the most common AI demo customers ask for.
Most AI workloads today involve a Worker (or any backend) calling out to an external LLM provider — OpenAI, Anthropic, Google. Every request crosses the public internet, costs latency, and creates a vendor relationship plus a bill plus a data-residency conversation. Cloudflare's pitch is to bring inference into the network: the same place your user's request landed and your code is already running.
Workers AI runs Cloudflare-managed open-source and commercial models on a global GPU network. From a Worker, calling a model is a binding call — env.AI.run("@cf/meta/llama-3.1-8b-instruct", ...) — not an external HTTP request. No API keys to manage, no vendor account to set up, no egress.
The catalog covers most of the practical task surface:
Workers AI is billed in neurons, an abstract unit that approximates GPU work. Different models cost different numbers of neurons per inference (a small embedding model may be 1 neuron; a 70B LLM token may be 2,000+). The free tier includes 10K neurons/day. This abstracts away the GPU type complexity that plagues other inference platforms.
Vectorize stores high-dimensional vectors (embeddings) and answers nearest-neighbor queries. It's the database half of RAG: you embed your documents into vectors with an embedding model, store them in Vectorize with metadata, and at query time you embed the user's question and ask Vectorize for the top-k closest matches.
Capabilities to know:
RAG (Retrieval-Augmented Generation) is the dominant pattern for "AI that knows about your data." Instead of training or fine-tuning a model on the customer's documents (expensive, slow, hard to update), you do this at request time:
RAG works because LLMs are good at using information you give them, even if they weren't trained on it. The four supporting pieces — embedding model, vector DB, document store, LLM — typically come from four vendors. On Cloudflare, all four are bindings on the same Worker.
Customers running AI in production usually want caching (don't pay twice for the same prompt), rate limiting (protect against runaway loops), retry logic, fallbacks across providers, analytics, and a unified log for compliance. AI Gateway is the Cloudflare product that sits in front of any AI provider (OpenAI, Anthropic, Bedrock, Workers AI itself) and gives you all of that as a config-driven proxy. Worth mentioning in any "we already use OpenAI" conversation.
For stateful AI agents (multi-turn conversations, tool use, durable memory), Cloudflare has the Agents SDK, which builds on Durable Objects. Each agent is a DO with its own conversation state, tool-calling loop, and persistent memory. We have a dedicated skill for it (agents-sdk) — bring it up when customers ask about "AI agents" or "Copilots."
app.post("/api/chat", async (c) => {
const { question } = await c.req.json<{ question: string }>();
const response = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [
{ role: "system", content: "You are a concise technical assistant." },
{ role: "user", content: question },
],
});
return c.json(response);
});
app.post("/api/chat-stream", async (c) => {
const { question } = await c.req.json<{ question: string }>();
const stream = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [{ role: "user", content: question }],
stream: true,
});
return new Response(stream, {
headers: { "content-type": "text/event-stream" },
});
});
wrangler vectorize create kb --dimensions=768 --metric=cosine
// Step 1: Index a document
app.post("/api/kb/index", async (c) => {
const { id, text } = await c.req.json<{ id: string; text: string }>();
// Persist source in R2 for retrieval
await c.env.FILES.put(`kb/${id}.txt`, text);
// Embed
const { data } = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [text] });
// Upsert into Vectorize
await c.env.VECTORIZE.upsert([
{ id, values: data[0], metadata: { key: `kb/${id}.txt` } },
]);
return c.json({ ok: true });
});
// Step 2: Ask a question grounded in the KB
app.post("/api/kb/ask", async (c) => {
const { question } = await c.req.json<{ question: string }>();
const { data: qVec } = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: [question],
});
const { matches } = await c.env.VECTORIZE.query(qVec[0], {
topK: 4,
returnMetadata: "all",
});
const docs = await Promise.all(
matches.map(async (m) => {
const obj = await c.env.FILES.get(m.metadata!.key as string);
return obj ? await obj.text() : "";
})
);
const answer = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [
{ role: "system", content: "Answer ONLY from the provided context. If unknown, say so." },
{ role: "user", content: `Context:\n${docs.join("\n---\n")}\n\nQuestion: ${question}` },
],
});
return c.json({ answer, citations: matches.map((m) => m.id) });
});
Pages is the Git-driven front door for full-stack apps. Push to a branch, get a unique preview URL. Pages Functions are file-routed Workers that share the same bindings.
Workers is great when you start with code. Pages is great when you start with a frontend framework. The product is built around three ideas that frontend teams already know how to use: a Git repo, a build command, and a public output directory. Connect the repo, set the build command (npm run build), point at the output (./dist or ./.next), and Pages handles the rest — global deploys, preview URLs per branch, automatic HTTPS, custom domains.
Pages serves your static assets from Cloudflare's CDN and runs your dynamic code as Pages Functions — the same Workers runtime, exposed via file-based routing. Drop a TypeScript file at /functions/api/users/[id].ts and it's served at /api/users/123. The function has access to all the same bindings (D1, KV, R2, Workers AI, Durable Objects) as a standalone Worker.
Cloudflare maintains adapters for the major full-stack frameworks: Next.js (via @cloudflare/next-on-pages or the newer OpenNext adapter), SvelteKit, Astro, Nuxt, Remix, Qwik, Solid Start. The C3 CLI (npm create cloudflare@latest) scaffolds any of them with bindings already wired up.
Every push to any branch gets a unique URL that runs the same code, with the same bindings (production or preview), as the production deploy. PRs get auto-commented with a link. Reviewers, designers, and PMs can click and test before merge. This is the feature that makes Pages stick once a team starts using it.
In 2024 Cloudflare introduced Static Assets on Workers, which lets a regular Worker serve a static asset directory directly without needing Pages at all. The current guidance for new full-stack projects is to start there — you get one product surface (Workers), one config file (wrangler.jsonc), and the same Git-driven deploys. Pages remains fully supported and is still the right answer for many existing teams; check current docs when scoping.
/functions/api/hello.ts and it's served at /api/hello.interface Env { DB: D1Database; }
export const onRequestGet: PagesFunction<Env> = async ({ params, env }) => {
const user = await env.DB
.prepare("SELECT id, name, email FROM users WHERE id = ?")
.bind(params.id)
.first();
return user ? Response.json(user) : new Response("Not found", { status: 404 });
};
export const onRequestDelete: PagesFunction<Env> = async ({ params, env }) => {
await env.DB.prepare("DELETE FROM users WHERE id = ?").bind(params.id).run();
return new Response(null, { status: 204 });
};
This is the demo I'd build for a SaaS prospect. It exercises Workers + Hono + D1 + KV + R2 + Queues + Workers AI in one project. Use it as a starting point.
A single Worker, deployed with one wrangler deploy, that serves an HTTP API (Hono), authenticates users with sessions in KV, persists relational data in D1, accepts file uploads to R2, fans out background work via Queues, runs nightly cleanup via a Cron Trigger, and exposes a RAG chat endpoint via Workers AI + Vectorize. One repo, one bill, one deploy, one network hop per request.
For a customer who's currently running an EC2 fleet behind an ALB, RDS Postgres, ElastiCache Redis, S3, SQS, and Lambda for thumbnails, this is a one-page replacement of an entire architecture. The "what would we keep?" conversation is short.
edge-saas/
├── wrangler.jsonc
├── migrations/
│ └── 0001_init.sql
└── src/
├── index.ts (Hono app + queue handler)
├── routes/
│ ├── auth.ts (KV-backed sessions)
│ ├── posts.ts (D1 CRUD)
│ ├── upload.ts (R2 + queue trigger)
│ └── ai.ts (LLM + RAG endpoints)
└── lib/
├── auth.ts
└── thumbs.ts (consumed by queue)
{
"name": "edge-saas",
"main": "src/index.ts",
"compatibility_date": "2026-01-01",
"compatibility_flags": ["nodejs_compat"],
"observability": { "enabled": true },
"kv_namespaces": [{ "binding": "SESSIONS", "id": "..." }],
"d1_databases": [
{ "binding": "DB", "database_name": "edge-saas", "database_id": "..." }
],
"r2_buckets": [{ "binding": "FILES", "bucket_name": "edge-saas-uploads" }],
"queues": {
"producers": [{ "binding": "THUMBS", "queue": "thumbs" }],
"consumers": [{ "queue": "thumbs", "max_batch_size": 10, "max_retries": 3 }]
},
"ai": { "binding": "AI" },
"vectorize": [{ "binding": "VECTORIZE", "index_name": "kb" }]
}
import { Hono } from "hono";
import { cors } from "hono/cors";
import auth from "./routes/auth";
import posts from "./routes/posts";
import upload from "./routes/upload";
import ai from "./routes/ai";
import { generateThumbnail } from "./lib/thumbs";
type Bindings = {
DB: D1Database;
SESSIONS: KVNamespace;
FILES: R2Bucket;
THUMBS: Queue<{ key: string }>;
AI: Ai;
VECTORIZE: VectorizeIndex;
};
const app = new Hono<{ Bindings: Bindings }>();
app.use("*", cors());
app.route("/auth", auth);
app.route("/api/posts", posts);
app.route("/api/upload", upload);
app.route("/api/ai", ai);
export default {
fetch: app.fetch,
async queue(batch: MessageBatch<{ key: string }>, env: Bindings) {
for (const msg of batch.messages) {
try {
await generateThumbnail(env, msg.body.key);
msg.ack();
} catch (e) {
msg.retry({ delaySeconds: 30 });
}
}
},
async scheduled(event: ScheduledController, env: Bindings, ctx: ExecutionContext) {
if (event.cron === "0 3 * * *") {
ctx.waitUntil(env.DB.prepare("DELETE FROM sessions WHERE expires_at < ?")
.bind(Date.now()).run());
}
},
} satisfies ExportedHandler<Bindings>;
import { Hono } from "hono";
import { setCookie, getCookie } from "hono/cookie";
const auth = new Hono<{ Bindings: Bindings }>();
auth.post("/login", async (c) => {
const { email, password } = await c.req.json();
const user = await c.env.DB.prepare(
"SELECT id, email, password_hash FROM users WHERE email = ?"
).bind(email).first<{ id: number; email: string; password_hash: string }>();
if (!user || !(await verify(password, user.password_hash))) {
return c.json({ error: "invalid credentials" }, 401);
}
const sid = crypto.randomUUID();
await c.env.SESSIONS.put(sid, JSON.stringify({ userId: user.id }), {
expirationTtl: 60 * 60 * 24 * 7, // 7 days
});
setCookie(c, "sid", sid, { httpOnly: true, secure: true, sameSite: "Lax", maxAge: 604800 });
return c.json({ ok: true });
});
auth.post("/logout", async (c) => {
const sid = getCookie(c, "sid");
if (sid) await c.env.SESSIONS.delete(sid);
return c.json({ ok: true });
});
export default auth;
import { Hono } from "hono";
const upload = new Hono<{ Bindings: Bindings }>();
upload.post("/", async (c) => {
const form = await c.req.formData();
const file = form.get("file") as File | null;
if (!file) return c.json({ error: "no file" }, 400);
const key = `${crypto.randomUUID()}/${file.name}`;
await c.env.FILES.put(key, file.stream(), {
httpMetadata: { contentType: file.type },
});
// Fire-and-forget thumbnail generation
await c.env.THUMBS.send({ key });
return c.json({ key });
});
export default upload;
Map a customer's words to the right pieces of the platform. These are the conversations you'll have most often.
| Customer says… | Reach for | Why |
|---|---|---|
| "We need an API gateway / BFF" | Workers + Hono | Edge-fast, no infra, easy auth/rate-limiting |
| "We're moving off S3 to cut egress" | R2 + Super Slurper | Zero egress, S3-compatible API, native to Workers |
| "We have multi-tenant rate limits per customer" | Durable Objects | One DO per tenant, atomic counters |
| "We're building a chatbot over our docs" | Workers AI + Vectorize + R2 | Full RAG in one Worker |
| "We need real-time presence/collab" | Durable Objects + WebSocket Hibernation | Stateful per-room, zero-cost idle |
| "Multi-step onboarding emails" | Workflows + Queues + Email Workers | Durable steps, retries, sleeps for days/weeks |
| "Postgres exists, can't migrate" | Hyperdrive | Pool + cache against existing DB |
| "Nightly ETL / report generation" | Cron Triggers + Workflows + R2 | One platform, no orchestrator needed |
| "Per-user feature flags + rollouts" | KV + Workers + Flagship | <10ms reads at the edge |
| "We need preview URLs for every PR" | Pages (or Workers + Static Assets) | Git-driven, automatic per-branch |
| "Untrusted user code in our app" | Sandbox SDK / Workers for Platforms | Isolated execution per tenant |
| Resource | Limit (paid) |
|---|---|
| Worker CPU per request | 30s default, configurable up to 5min |
| Worker request size | ~100 MB |
| KV value | 25 MiB |
| KV writes per key | 1/sec |
| D1 database | 10 GB per database |
| D1 row size | 1 MB |
| R2 object | 5 TB (multipart) |
| Durable Object throughput | ~1K req/s per instance |
| Queue message | 128 KB |
| Queue throughput | 5K msg/sec per queue |
| Vectorize | 10M vectors per index, 1536 dims max |