Blaize ships GSP® silicon, AI Studio, and the new AI Services Platform for cloud providers, system integrators, and government deployments worldwide. The control plane that turns infrastructure into application-level AI APIs — multimodal inference, business logic, orchestration, per-tenant isolation, sovereign data residency — is what Cloudflare's developer platform was built for. You're already on Cloudflare at the edge. Here's what the layer above looks like.
Read Blaize's own AI Services Platform page side-by-side with Cloudflare's developer platform docs. The vocabulary — programmable runtime, composable APIs, modular inference, application-level services, hybrid edge-to-core — is uncannily aligned.
"A unified platform built on programmable silicon and a composable software stack packages multimodal inference, business logic, management, and orchestration into modular APIs — delivering the majority of required functionality out of the box... Application-level AI services package multimodal inference, business logic, scheduling, lifecycle management, and orchestration into modular APIs that deliver production-ready AI functionality."
blaize.com is already on Cloudflare — server: cloudflare, cf-ray, __cf_bm cookie on every response. Your developer.blaize.com runs on AWS us-west-1 GitLab Pages. Mail flows through Proofpoint + Microsoft 365.
Your TXT records confirm Anthropic in production. And your AI Services Platform pitch positions exactly the architecture that Cloudflare's developer platform is built for —
not as a competitor to GSP silicon, but as the cloud-side control plane that turns programmable silicon into application-level AI APIs.
Ranked by impact-per-effort for the AI Services Platform launch specifically — composable APIs, per-tenant isolation, sovereign deployments, <90-day CSP onboarding.
The AI Services Platform pitch targets Cloud Service Providers, System Integrators, telcos. Each partner needs an isolated runtime — their own pricing logic, their own API gateway customizations, their own branded service catalog. Workers for Platforms dispatch namespaces give you one isolated worker per CSP partner. Direct fit for "<90 days to AI Service Launch" — namespaces deploy in minutes, not quarters.
Your platform is explicitly hybrid — GSP silicon + GPU + third-party LLM (Anthropic confirmed in TXT). AI Gateway sits as the unified routing layer in front of all three: per-partner cost attribution, semantic cache on repeated inference patterns, fallback routing, full audit logs. Workload scheduler decides at runtime which compute target (GSP, GPU, cloud LLM) serves each query.
Your named deployments — South Asia smart city, Middle East sovereign AI, APAC hybrid operations — each require data residency guarantees. Cloudflare Regional Services pins data plane (encryption, key material, content inspection) to specific geographic regions. Each Blaize-powered sovereign deployment gets its own compliance perimeter without operating separate clouds per country.
Smart City Analytics, Industrial Monitoring, Security & Surveillance all need stateful streams — each camera, each sensor, each video feed has running state (calibration, anomaly baseline, last-N inferences, alert history). Durable Objects = one DO per stream — strongly consistent, geo-routed, hibernating when idle. Replaces a per-stream service tier in your AWS layer.
Your edge silicon hits 16 TOPS at 7W. Workers AI complements that — for inference that doesn't yet fit Pathfinder/Xplorer (frontier vision models, Whisper-class voice, embedding generation), Workers AI runs at the same POP closest to the data. The handoff: GSP handles continuous on-device inference; Workers AI handles cloud-side enrichment. No competition — composition.
Vision | Video | Document | Multimodal | Speech | Moderation — six modular APIs, each with their own contracts. API Shield reads your OpenAPI spec, enforces schema at the edge, applies per-CSP-partner mTLS, blocks abuse before it reaches your AWS gateway. The "<90 days to launch" target depends on API surface security being a primitive, not a project.
From your platform page: "Each instance generates operational telemetry that feeds model optimization and workload routing — a platform that delivers more value at site 100 than it did at site 1." That self-enriching loop = a petabyte data lake over time. R2 (zero egress) replaces S3 for the telemetry archive. Every customer retraining pass, every model optimization batch, every auditor query — zero egress fees.
Your Winmate partnership brings rugged edge AI to defense/industrial. Your South Asia + Middle East deployments are sovereign/governmental. Cloudflare for Government is FedRAMP Moderate authorized (FedRAMP High in process) — bridging your commercial AI Services Platform and your federal/sovereign deployments under one architectural family. Same primitives, FedRAMP-compatible.
Each layer of the Blaize AI Services Platform diagram maps to specific Cloudflare developer primitives. Not approximately — exactly.
| Blaize platform layer | What it does | Cloudflare primitive |
|---|---|---|
| Marketing site (today) | blaize.com already on Cloudflare |
Cloudflare ✅ (in production) |
| Application-Level AI Services (API) | Vision, Video, Document, Multimodal, Speech, Moderation APIs | Workers + API Shield reading OpenAPI |
| Application Use Cases | Smart City, Retail Intelligence, Industrial, Security | Workers for Platforms per vertical/customer |
| API Gateway (your stack) | Auth, rate limit, request routing, transformations | Workers + API Shield + Cache |
| AI Services Engine | Multimodal inference dispatch, model selection | AI Gateway + Workers AI + LLM routing |
| Integration & Runtime Enablement | Connect to sensors, VMS, ERP, third-party systems | Workers + Queues + Workflows |
| Programmable Silicon (GSP®) | Pathfinder + Xplorer hardware at the edge | No competition — composition. Workers AI for cloud-side complement. |
| Operational telemetry (self-enriching) | Every deployment improves model + routing decisions | R2 (zero egress) + Workers Analytics Engine |
| CSP partner deployments | Cloud Service Providers turn infrastructure into AI services | Workers for Platforms dispatch namespaces |
| Sovereign / data residency | South Asia, Middle East, APAC, defense | Regional Services + Cloudflare for Government |
| Per-camera / per-sensor state | Calibration, anomaly baseline, alert history | Durable Objects (1 DO per asset) |
| Vision similarity / face recognition | Your first application-level AI service (per Q1 2026 launch) | Vectorize + Workers AI Embeddings |
Drag the sliders. When CSP partners run similar AI services across their customer base, semantic caching scales with N. Face recognition, smart city analytics, retail intelligence — the inference queries are highly repetitive when normalized.
Cache hits cost ~5% of a full inference call (embedding lookup + small response stitch). Adjust sliders for the AI Services Platform scale you're targeting.
Directional. AI Gateway also adds free observability, rate limiting, fallback routing, per-CSP-partner cost attribution, and request logging — none of which is priced into the chart above. The compounding effect: as Blaize adds CSP partners, cache-hit rate goes up, not down. The "2.4× streams per rack" claim compounds with the cache math.
A camera in a South Asian smart-city deployment captures a face. The CSP partner's app calls Blaize AI Services Platform for a face-recognition decision. Following the full path.
Pathfinder embedded in the camera runs continuous low-power inference — face detection, bounding box extraction, basic quality scoring. At 16 TOPS / 7W, the local silicon does the heavy lifting without sending raw video upstream. Privacy-preserving by architecture.
Only the embedding vector + metadata travels to api.blaize.com/v1/face-recognition. DNS resolves to the closest POP — Mumbai, not us-east-1. Round-trip drops from ~210ms to ~12ms.
OpenAPI schema validation — required fields present, vector dimensions correct, mTLS cert from the CSP partner. Bot Management scores the request. The request is auth-validated and policy-validated at the edge before reaching origin. ~10ms.
Hostname → CSP partner lookup. The partner's worker — with their custom service catalog, pricing logic, branded API responses, regulatory policies — runs in an isolated runtime. Zero noisy-neighbor risk between this CSP and the Middle East sovereign deployment.
The face embedding is queried against this CSP's isolated Vectorize index — millions of enrolled faces, per-partner isolation. Top-K matches return in sub-30ms. Per-region data residency enforced via Regional Services — the index stays in BOM, not us-east.
The DO for this specific camera updates — last-seen timestamp, recognition outcome, alert history. If this camera has flagged 3+ matches in the last hour, escalate. Strongly consistent, geo-pinned to BOM, hibernates when the camera goes quiet.
If the workflow requires LLM reasoning (e.g., "summarize today's anomalies for this district"), AI Gateway routes to Claude or Workers AI based on cost+latency. Cache hit on repeated district summaries returns in 30ms. Per-CSP cost attribution recorded for invoicing.
Decision trace + embeddings + outcome written to R2 (zero egress when the CSP's customer audits, when the model retrains, when the platform improves). Operational telemetry feeds back into Blaize's "site 100 better than site 1" compounding loop. Total wall-clock: under 200ms end-to-end.
Blaize ships the silicon and the SDK. Cloudflare ships the cloud control plane that turns programmable silicon into application-level AI APIs. As the AI Services Platform launches into CSP, system integrator, and sovereign deployments, the cloud-side architecture decisions made now compound for the next decade. Worth comparing notes on how Workers for Platforms + AI Gateway + Vectorize map specifically against your CSP partner onboarding roadmap?
Book 30 min with Matt Holscher →