For the Blaize team  ·  AI Services Platform concept brief

Graph-native silicon at the edge. Graph-native primitives in the cloud.

Blaize ships GSP® silicon, AI Studio, and the new AI Services Platform for cloud providers, system integrators, and government deployments worldwide. The control plane that turns infrastructure into application-level AI APIs — multimodal inference, business logic, orchestration, per-tenant isolation, sovereign data residency — is what Cloudflare's developer platform was built for. You're already on Cloudflare at the edge. Here's what the layer above looks like.

2.4×
Streams per rack vs. GPU-only
60%
Less power per rack
<90d
CSP AI service launch target
330+
Cloudflare POPs
The thesis

Your AI Services Platform brief reads like
Cloudflare's developer platform spec.

Read Blaize's own AI Services Platform page side-by-side with Cloudflare's developer platform docs. The vocabulary — programmable runtime, composable APIs, modular inference, application-level services, hybrid edge-to-core — is uncannily aligned.

From blaize.com — AI Services Platform
"A unified platform built on programmable silicon and a composable software stack packages multimodal inference, business logic, management, and orchestration into modular APIs — delivering the majority of required functionality out of the box... Application-level AI services package multimodal inference, business logic, scheduling, lifecycle management, and orchestration into modular APIs that deliver production-ready AI functionality."
— Blaize AI Services Platform page, June 2026
Blaize ships

The silicon + the SDK

Pathfinder®, Xplorer®, GSP®, Picasso®, AI Studio — graph-native compute at the edge

×
Cloudflare ships

The cloud control plane

Workers, AI Gateway, Workers for Platforms, R2, Vectorize — graph-native primitives in the cloud

What we noticed in your stack

blaize.com is already on Cloudflareserver: cloudflare, cf-ray, __cf_bm cookie on every response. Your developer.blaize.com runs on AWS us-west-1 GitLab Pages. Mail flows through Proofpoint + Microsoft 365. Your TXT records confirm Anthropic in production. And your AI Services Platform pitch positions exactly the architecture that Cloudflare's developer platform is built for — not as a competitor to GSP silicon, but as the cloud-side control plane that turns programmable silicon into application-level AI APIs.

Value plays

Eight things Cloudflare changes for Blaize.

Ranked by impact-per-effort for the AI Services Platform launch specifically — composable APIs, per-tenant isolation, sovereign deployments, <90-day CSP onboarding.

01 — Flagship

Workers for Platforms = CSP-per-namespace isolation

The AI Services Platform pitch targets Cloud Service Providers, System Integrators, telcos. Each partner needs an isolated runtime — their own pricing logic, their own API gateway customizations, their own branded service catalog. Workers for Platforms dispatch namespaces give you one isolated worker per CSP partner. Direct fit for "<90 days to AI Service Launch" — namespaces deploy in minutes, not quarters.

Workers for Platforms Dispatch Namespaces
Multi-tenant CSP architecture, primitive-form
02 — Hybrid AI

AI Gateway for GSP + GPU + LLM orchestration

Your platform is explicitly hybrid — GSP silicon + GPU + third-party LLM (Anthropic confirmed in TXT). AI Gateway sits as the unified routing layer in front of all three: per-partner cost attribution, semantic cache on repeated inference patterns, fallback routing, full audit logs. Workload scheduler decides at runtime which compute target (GSP, GPU, cloud LLM) serves each query.

AI Gateway Workers AI Multi-provider
See calculator below ↓
03 — Sovereign AI

Regional Services for data residency by deployment

Your named deployments — South Asia smart city, Middle East sovereign AI, APAC hybrid operations — each require data residency guarantees. Cloudflare Regional Services pins data plane (encryption, key material, content inspection) to specific geographic regions. Each Blaize-powered sovereign deployment gets its own compliance perimeter without operating separate clouds per country.

Regional Services Data Localization
One platform, N sovereign perimeters
04 — Stateful inference

Durable Objects per camera / per sensor / per stream

Smart City Analytics, Industrial Monitoring, Security & Surveillance all need stateful streams — each camera, each sensor, each video feed has running state (calibration, anomaly baseline, last-N inferences, alert history). Durable Objects = one DO per stream — strongly consistent, geo-routed, hibernating when idle. Replaces a per-stream service tier in your AWS layer.

Durable Objects Workers Storage API
Native digital-twin runtime per asset
05 — Vision at edge

Workers AI for the <7W reality

Your edge silicon hits 16 TOPS at 7W. Workers AI complements that — for inference that doesn't yet fit Pathfinder/Xplorer (frontier vision models, Whisper-class voice, embedding generation), Workers AI runs at the same POP closest to the data. The handoff: GSP handles continuous on-device inference; Workers AI handles cloud-side enrichment. No competition — composition.

Workers AI Whisper Vision Models
Edge-to-cloud handoff, no battles
06 — Composable APIs

API Shield for the AI Services API surface

Vision | Video | Document | Multimodal | Speech | Moderation — six modular APIs, each with their own contracts. API Shield reads your OpenAPI spec, enforces schema at the edge, applies per-CSP-partner mTLS, blocks abuse before it reaches your AWS gateway. The "<90 days to launch" target depends on API surface security being a primitive, not a project.

API Shield mTLS Bot Management
Composable APIs need composable security
07 — Telemetry storage

R2 for the operational telemetry that compounds

From your platform page: "Each instance generates operational telemetry that feeds model optimization and workload routing — a platform that delivers more value at site 100 than it did at site 1." That self-enriching loop = a petabyte data lake over time. R2 (zero egress) replaces S3 for the telemetry archive. Every customer retraining pass, every model optimization batch, every auditor query — zero egress fees.

R2 Zero Egress S3-compatible API
Typical 40-60% storage TCO reduction
08 — FedRAMP path

Cloudflare for Government for defense + sovereign

Your Winmate partnership brings rugged edge AI to defense/industrial. Your South Asia + Middle East deployments are sovereign/governmental. Cloudflare for Government is FedRAMP Moderate authorized (FedRAMP High in process) — bridging your commercial AI Services Platform and your federal/sovereign deployments under one architectural family. Same primitives, FedRAMP-compatible.

Cloudflare for Government FedRAMP Moderate
Commercial → defense, one stack
Mapping

Blaize AI Services Platform layers → Cloudflare primitives.

Each layer of the Blaize AI Services Platform diagram maps to specific Cloudflare developer primitives. Not approximately — exactly.

Blaize platform layer What it does Cloudflare primitive
Marketing site (today) blaize.com already on Cloudflare Cloudflare ✅ (in production)
Application-Level AI Services (API) Vision, Video, Document, Multimodal, Speech, Moderation APIs Workers + API Shield reading OpenAPI
Application Use Cases Smart City, Retail Intelligence, Industrial, Security Workers for Platforms per vertical/customer
API Gateway (your stack) Auth, rate limit, request routing, transformations Workers + API Shield + Cache
AI Services Engine Multimodal inference dispatch, model selection AI Gateway + Workers AI + LLM routing
Integration & Runtime Enablement Connect to sensors, VMS, ERP, third-party systems Workers + Queues + Workflows
Programmable Silicon (GSP®) Pathfinder + Xplorer hardware at the edge No competition — composition. Workers AI for cloud-side complement.
Operational telemetry (self-enriching) Every deployment improves model + routing decisions R2 (zero egress) + Workers Analytics Engine
CSP partner deployments Cloud Service Providers turn infrastructure into AI services Workers for Platforms dispatch namespaces
Sovereign / data residency South Asia, Middle East, APAC, defense Regional Services + Cloudflare for Government
Per-camera / per-sensor state Calibration, anomaly baseline, alert history Durable Objects (1 DO per asset)
Vision similarity / face recognition Your first application-level AI service (per Q1 2026 launch) Vectorize + Workers AI Embeddings
Quantify it

The AI Gateway math for Blaize CSP partners.

Drag the sliders. When CSP partners run similar AI services across their customer base, semantic caching scales with N. Face recognition, smart city analytics, retail intelligence — the inference queries are highly repetitive when normalized.

AI Gateway savings calculator

Annual inference cost — with and without semantic cache

Cache hits cost ~5% of a full inference call (embedding lookup + small response stitch). Adjust sliders for the AI Services Platform scale you're targeting.

25
250,000
2,000
50%
$12
Total API calls / year 2.3B
Total tokens / year 4.6T
Cost without AI Gateway $54.8M
Cost with semantic cache $28.7M
Annual savings $26.0M

Directional. AI Gateway also adds free observability, rate limiting, fallback routing, per-CSP-partner cost attribution, and request logging — none of which is priced into the chart above. The compounding effect: as Blaize adds CSP partners, cache-hit rate goes up, not down. The "2.4× streams per rack" claim compounds with the cache math.

Architecture

How a smart-city face recognition call flows on Cloudflare.

A camera in a South Asian smart-city deployment captures a face. The CSP partner's app calls Blaize AI Services Platform for a face-recognition decision. Following the full path.

1

GSP silicon does the first-pass at the camera

Pathfinder embedded in the camera runs continuous low-power inference — face detection, bounding box extraction, basic quality scoring. At 16 TOPS / 7W, the local silicon does the heavy lifting without sending raw video upstream. Privacy-preserving by architecture.

Blaize GSP® Pathfinder®
2

API call hits the nearest Cloudflare POP (BOM)

Only the embedding vector + metadata travels to api.blaize.com/v1/face-recognition. DNS resolves to the closest POP — Mumbai, not us-east-1. Round-trip drops from ~210ms to ~12ms.

Workers Smart Placement
3

API Shield + Bot Management enforce the contract

OpenAPI schema validation — required fields present, vector dimensions correct, mTLS cert from the CSP partner. Bot Management scores the request. The request is auth-validated and policy-validated at the edge before reaching origin. ~10ms.

API Shield Bot Management mTLS
4

Workers for Platforms routes to the CSP partner's namespace

Hostname → CSP partner lookup. The partner's worker — with their custom service catalog, pricing logic, branded API responses, regulatory policies — runs in an isolated runtime. Zero noisy-neighbor risk between this CSP and the Middle East sovereign deployment.

Workers for Platforms Dispatch Namespaces
5

Vectorize matches against the CSP's face database

The face embedding is queried against this CSP's isolated Vectorize index — millions of enrolled faces, per-partner isolation. Top-K matches return in sub-30ms. Per-region data residency enforced via Regional Services — the index stays in BOM, not us-east.

Vectorize Regional Services R2
6

Camera's Durable Object updates session state

The DO for this specific camera updates — last-seen timestamp, recognition outcome, alert history. If this camera has flagged 3+ matches in the last hour, escalate. Strongly consistent, geo-pinned to BOM, hibernates when the camera goes quiet.

Durable Objects Storage API
7

AI Gateway routes any LLM-class reasoning

If the workflow requires LLM reasoning (e.g., "summarize today's anomalies for this district"), AI Gateway routes to Claude or Workers AI based on cost+latency. Cache hit on repeated district summaries returns in 30ms. Per-CSP cost attribution recorded for invoicing.

AI Gateway Semantic Cache Workers AI
8

Full event archived to R2, telemetry compounds

Decision trace + embeddings + outcome written to R2 (zero egress when the CSP's customer audits, when the model retrains, when the platform improves). Operational telemetry feeds back into Blaize's "site 100 better than site 1" compounding loop. Total wall-clock: under 200ms end-to-end.

R2 Logpush Workers Analytics Engine

30 minutes. No slides. The control plane conversation.

Blaize ships the silicon and the SDK. Cloudflare ships the cloud control plane that turns programmable silicon into application-level AI APIs. As the AI Services Platform launches into CSP, system integrator, and sovereign deployments, the cloud-side architecture decisions made now compound for the next decade. Worth comparing notes on how Workers for Platforms + AI Gateway + Vectorize map specifically against your CSP partner onboarding roadmap?

Book 30 min with Matt Holscher
Matt Holscher · Solutions Engineer · Cloudflare Developer Platform