Blaize on Cloudflare — The cloud control plane for Pathfinder, Xplorer, and AI Services

The thesis

Your AI Services Platform brief reads like
Cloudflare's developer platform spec.

Read Blaize's own AI Services Platform page side-by-side with Cloudflare's developer platform docs. The vocabulary — programmable runtime, composable APIs, modular inference, application-level services, hybrid edge-to-core — is uncannily aligned.

From blaize.com — AI Services Platform

"A unified platform built on programmable silicon and a composable software stack packages multimodal inference, business logic, management, and orchestration into modular APIs — delivering the majority of required functionality out of the box... Application-level AI services package multimodal inference, business logic, scheduling, lifecycle management, and orchestration into modular APIs that deliver production-ready AI functionality."

— Blaize AI Services Platform page, June 2026

What we noticed in your stack

blaize.com is already on Cloudflare — server: cloudflare, cf-ray, __cf_bm cookie on every response. Your developer.blaize.com runs on AWS us-west-1 GitLab Pages. Mail flows through Proofpoint + Microsoft 365. Your TXT records confirm Anthropic in production. And your AI Services Platform pitch positions exactly the architecture that Cloudflare's developer platform is built for — not as a competitor to GSP silicon, but as the cloud-side control plane that turns programmable silicon into application-level AI APIs.

Value plays

Eight things Cloudflare changes for Blaize.

Ranked by impact-per-effort for the AI Services Platform launch specifically — composable APIs, per-tenant isolation, sovereign deployments, <90-day CSP onboarding.

01 — Flagship

Workers for Platforms = CSP-per-namespace isolation

The AI Services Platform pitch targets Cloud Service Providers, System Integrators, telcos. Each partner needs an isolated runtime — their own pricing logic, their own API gateway customizations, their own branded service catalog. Workers for Platforms dispatch namespaces give you one isolated worker per CSP partner. Direct fit for "<90 days to AI Service Launch" — namespaces deploy in minutes, not quarters.

Workers for Platforms Dispatch Namespaces

Multi-tenant CSP architecture, primitive-form

02 — Hybrid AI

AI Gateway for GSP + GPU + LLM orchestration

Your platform is explicitly hybrid — GSP silicon + GPU + third-party LLM (Anthropic confirmed in TXT). AI Gateway sits as the unified routing layer in front of all three: per-partner cost attribution, semantic cache on repeated inference patterns, fallback routing, full audit logs. Workload scheduler decides at runtime which compute target (GSP, GPU, cloud LLM) serves each query.

AI Gateway Workers AI Multi-provider

See calculator below ↓

03 — Sovereign AI

Regional Services for data residency by deployment

Your named deployments — South Asia smart city, Middle East sovereign AI, APAC hybrid operations — each require data residency guarantees. Cloudflare Regional Services pins data plane (encryption, key material, content inspection) to specific geographic regions. Each Blaize-powered sovereign deployment gets its own compliance perimeter without operating separate clouds per country.

Regional Services Data Localization

One platform, N sovereign perimeters

04 — Stateful inference

Durable Objects per camera / per sensor / per stream

Smart City Analytics, Industrial Monitoring, Security & Surveillance all need stateful streams — each camera, each sensor, each video feed has running state (calibration, anomaly baseline, last-N inferences, alert history). Durable Objects = one DO per stream — strongly consistent, geo-routed, hibernating when idle. Replaces a per-stream service tier in your AWS layer.

Durable Objects Workers Storage API

Native digital-twin runtime per asset

05 — Vision at edge

Workers AI for the <7W reality

Your edge silicon hits 16 TOPS at 7W. Workers AI complements that — for inference that doesn't yet fit Pathfinder/Xplorer (frontier vision models, Whisper-class voice, embedding generation), Workers AI runs at the same POP closest to the data. The handoff: GSP handles continuous on-device inference; Workers AI handles cloud-side enrichment. No competition — composition.

Workers AI Whisper Vision Models

Edge-to-cloud handoff, no battles

06 — Composable APIs

API Shield for the AI Services API surface

Vision | Video | Document | Multimodal | Speech | Moderation — six modular APIs, each with their own contracts. API Shield reads your OpenAPI spec, enforces schema at the edge, applies per-CSP-partner mTLS, blocks abuse before it reaches your AWS gateway. The "<90 days to launch" target depends on API surface security being a primitive, not a project.

API Shield mTLS Bot Management

Composable APIs need composable security

07 — Telemetry storage

R2 for the operational telemetry that compounds

From your platform page: "Each instance generates operational telemetry that feeds model optimization and workload routing — a platform that delivers more value at site 100 than it did at site 1." That self-enriching loop = a petabyte data lake over time. R2 (zero egress) replaces S3 for the telemetry archive. Every customer retraining pass, every model optimization batch, every auditor query — zero egress fees.

R2 Zero Egress S3-compatible API

Typical 40-60% storage TCO reduction

08 — FedRAMP path

Cloudflare for Government for defense + sovereign

Your Winmate partnership brings rugged edge AI to defense/industrial. Your South Asia + Middle East deployments are sovereign/governmental. Cloudflare for Government is FedRAMP Moderate authorized (FedRAMP High in process) — bridging your commercial AI Services Platform and your federal/sovereign deployments under one architectural family. Same primitives, FedRAMP-compatible.

Cloudflare for Government FedRAMP Moderate

Commercial → defense, one stack

Mapping

Blaize AI Services Platform layers → Cloudflare primitives.

Each layer of the Blaize AI Services Platform diagram maps to specific Cloudflare developer primitives. Not approximately — exactly.

Blaize platform layer	What it does	Cloudflare primitive
Marketing site (today)	`blaize.com` already on Cloudflare	`Cloudflare` ✅ (in production)
Application-Level AI Services (API)	Vision, Video, Document, Multimodal, Speech, Moderation APIs	`Workers` + `API Shield` reading OpenAPI
Application Use Cases	Smart City, Retail Intelligence, Industrial, Security	`Workers for Platforms` per vertical/customer
API Gateway (your stack)	Auth, rate limit, request routing, transformations	`Workers` + `API Shield` + `Cache`
AI Services Engine	Multimodal inference dispatch, model selection	`AI Gateway` + `Workers AI` + LLM routing
Integration & Runtime Enablement	Connect to sensors, VMS, ERP, third-party systems	`Workers` + `Queues` + `Workflows`
Programmable Silicon (GSP®)	Pathfinder + Xplorer hardware at the edge	No competition — composition. Workers AI for cloud-side complement.
Operational telemetry (self-enriching)	Every deployment improves model + routing decisions	`R2` (zero egress) + `Workers Analytics Engine`
CSP partner deployments	Cloud Service Providers turn infrastructure into AI services	`Workers for Platforms` dispatch namespaces
Sovereign / data residency	South Asia, Middle East, APAC, defense	`Regional Services` + `Cloudflare for Government`
Per-camera / per-sensor state	Calibration, anomaly baseline, alert history	`Durable Objects` (1 DO per asset)
Vision similarity / face recognition	Your first application-level AI service (per Q1 2026 launch)	`Vectorize` + `Workers AI Embeddings`

Quantify it

The AI Gateway math for Blaize CSP partners.

Drag the sliders. When CSP partners run similar AI services across their customer base, semantic caching scales with N. Face recognition, smart city analytics, retail intelligence — the inference queries are highly repetitive when normalized.

AI Gateway savings calculator

Annual inference cost — with and without semantic cache

Cache hits cost ~5% of a full inference call (embedding lookup + small response stitch). Adjust sliders for the AI Services Platform scale you're targeting.

CSP partners on AI Services Platform

Avg API calls per partner per day

250,000

Avg tokens per call (vision + text)

2,000

Cross-partner semantic cache hit rate

50%

Blended model cost per 1M tokens

$12

Total API calls / year 2.3B

Total tokens / year 4.6T

Cost without AI Gateway $54.8M

Cost with semantic cache $28.7M

            Annual savings
            $26.0M
          

Directional. AI Gateway also adds free observability, rate limiting, fallback routing, per-CSP-partner cost attribution, and request logging — none of which is priced into the chart above. The compounding effect: as Blaize adds CSP partners, cache-hit rate goes up, not down. The "2.4× streams per rack" claim compounds with the cache math.

Architecture

How a smart-city face recognition call flows on Cloudflare.

A camera in a South Asian smart-city deployment captures a face. The CSP partner's app calls Blaize AI Services Platform for a face-recognition decision. Following the full path.

GSP silicon does the first-pass at the camera

Pathfinder embedded in the camera runs continuous low-power inference — face detection, bounding box extraction, basic quality scoring. At 16 TOPS / 7W, the local silicon does the heavy lifting without sending raw video upstream. Privacy-preserving by architecture.

Blaize GSP® Pathfinder®

API call hits the nearest Cloudflare POP (BOM)

Only the embedding vector + metadata travels to api.blaize.com/v1/face-recognition. DNS resolves to the closest POP — Mumbai, not us-east-1. Round-trip drops from ~210ms to ~12ms.

Workers Smart Placement

API Shield + Bot Management enforce the contract

OpenAPI schema validation — required fields present, vector dimensions correct, mTLS cert from the CSP partner. Bot Management scores the request. The request is auth-validated and policy-validated at the edge before reaching origin. ~10ms.

API Shield Bot Management mTLS

Workers for Platforms routes to the CSP partner's namespace

Hostname → CSP partner lookup. The partner's worker — with their custom service catalog, pricing logic, branded API responses, regulatory policies — runs in an isolated runtime. Zero noisy-neighbor risk between this CSP and the Middle East sovereign deployment.

Workers for Platforms Dispatch Namespaces

Vectorize matches against the CSP's face database

The face embedding is queried against this CSP's isolated Vectorize index — millions of enrolled faces, per-partner isolation. Top-K matches return in sub-30ms. Per-region data residency enforced via Regional Services — the index stays in BOM, not us-east.

Vectorize Regional Services R2

Camera's Durable Object updates session state

The DO for this specific camera updates — last-seen timestamp, recognition outcome, alert history. If this camera has flagged 3+ matches in the last hour, escalate. Strongly consistent, geo-pinned to BOM, hibernates when the camera goes quiet.

Durable Objects Storage API

AI Gateway routes any LLM-class reasoning

If the workflow requires LLM reasoning (e.g., "summarize today's anomalies for this district"), AI Gateway routes to Claude or Workers AI based on cost+latency. Cache hit on repeated district summaries returns in 30ms. Per-CSP cost attribution recorded for invoicing.

AI Gateway Semantic Cache Workers AI

Full event archived to R2, telemetry compounds

Decision trace + embeddings + outcome written to R2 (zero egress when the CSP's customer audits, when the model retrains, when the platform improves). Operational telemetry feeds back into Blaize's "site 100 better than site 1" compounding loop. Total wall-clock: under 200ms end-to-end.

R2 Logpush Workers Analytics Engine

Graph-native silicon at the edge. Graph-native primitives in the cloud.

Your AI Services Platform brief reads like
Cloudflare's developer platform spec.

The silicon + the SDK

The cloud control plane

Eight things Cloudflare changes for Blaize.

Workers for Platforms = CSP-per-namespace isolation

AI Gateway for GSP + GPU + LLM orchestration

Regional Services for data residency by deployment

Durable Objects per camera / per sensor / per stream

Workers AI for the <7W reality

API Shield for the AI Services API surface

R2 for the operational telemetry that compounds

Cloudflare for Government for defense + sovereign

Blaize AI Services Platform layers → Cloudflare primitives.

The AI Gateway math for Blaize CSP partners.

Annual inference cost — with and without semantic cache

How a smart-city face recognition call flows on Cloudflare.

GSP silicon does the first-pass at the camera

API call hits the nearest Cloudflare POP (BOM)

API Shield + Bot Management enforce the contract

Workers for Platforms routes to the CSP partner's namespace

Vectorize matches against the CSP's face database

Camera's Durable Object updates session state

AI Gateway routes any LLM-class reasoning

Full event archived to R2, telemetry compounds

30 minutes. No slides. The control plane conversation.

Your AI Services Platform brief reads likeCloudflare's developer platform spec.

The silicon + the SDK

The cloud control plane

Eight things Cloudflare changes for Blaize.

Workers for Platforms = CSP-per-namespace isolation

AI Gateway for GSP + GPU + LLM orchestration

Regional Services for data residency by deployment

Durable Objects per camera / per sensor / per stream

Workers AI for the <7W reality

API Shield for the AI Services API surface

R2 for the operational telemetry that compounds

Cloudflare for Government for defense + sovereign

Blaize AI Services Platform layers → Cloudflare primitives.

The AI Gateway math for Blaize CSP partners.

Annual inference cost — with and without semantic cache

How a smart-city face recognition call flows on Cloudflare.

GSP silicon does the first-pass at the camera

API call hits the nearest Cloudflare POP (BOM)

API Shield + Bot Management enforce the contract

Workers for Platforms routes to the CSP partner's namespace

Vectorize matches against the CSP's face database

Camera's Durable Object updates session state

AI Gateway routes any LLM-class reasoning

Full event archived to R2, telemetry compounds

30 minutes. No slides. The control plane conversation.

Your AI Services Platform brief reads like
Cloudflare's developer platform spec.