Gateway specification

The Pura gateway (api.pura.xyz) is an OpenAI-compatible HTTP endpoint that routes inference requests across providers using on-chain capacity weights. Swap your baseURL and everything else stays the same.

Base URL

text

https://api.pura.xyz/v1

Authentication

All requests require a pura_ prefixed API key in the Authorization header:

text

Authorization: Bearer pura_abc123...

Generate keys at pura.xyz/gateway.

Endpoints

POST /v1/chat/completions

Standard OpenAI chat completion. Supports streaming.

shell

curl https://api.pura.xyz/v1/chat/completions \
  -H "Authorization: Bearer pura_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "hello"}],
    "stream": true
  }'

Request body follows the OpenAI API spec. The model field selects the downstream model. The gateway routes to whichever provider has spare capacity for that model.

Routing hints

Pass an optional routing object in the request body to influence provider selection:

json

{
  "messages": [{"role": "user", "content": "Explain backpressure routing"}],
  "routing": {
    "quality": "high",
    "prefer": "anthropic"
  }
}

Field	Type	Description
`quality`	`"low"` / `"balanced"` / `"high"`	Shifts the complexity tier up or down. `"high"` bumps cheap tasks to mid-tier models and mid tasks to premium. `"low"` does the reverse. Default: `"balanced"` (no shift).
`prefer`	`string`	Soft preference for a provider name (e.g. `"anthropic"`). Doubles that provider's capacity weight during selection. Not a hard lock — use `model` for that.
`maxCost`	`number`	Experimental. Max cost per 1K tokens in USD. Filters out providers above this rate.
`maxLatency`	`number`	Experimental. Max average latency in ms (5-minute window). Filters out providers above this threshold.
`excludeProviders`	`string[]`	Experimental. Provider names to exclude from routing.

Experimental fields are accepted and honored. The response includes an X-Pura-Experimental header listing which experimental fields were used.

GET /v1/models

Returns gateway health and available providers.

json

{
  "status": "ok",
  "service": "pura-gateway",
  "version": "0.1.0",
  "chain": "base-sepolia",
  "timestamp": "2026-07-01T00:00:00.000Z"
}

Response headers

Every response includes routing metadata:

Header	Description
`X-Pura-Provider`	Which provider handled the request (e.g. `openai`, `anthropic`)
`X-Pura-Request-Id`	Unique request identifier for on-chain receipt lookup
`X-Pura-Tier`	Complexity tier used for routing (`cheap`, `mid`, `premium`)
`X-Pura-Cost`	Estimated request cost in USD
`X-Pura-Budget-Remaining`	Daily budget remaining in USD
`X-Pura-Quality`	Quality bias applied, if any (`low`, `balanced`, `high`)
`X-Pura-Explored`	`true` when the router explored a non-preferred provider
`X-Pura-Experimental`	Comma-separated list of experimental routing fields used
`X-RateLimit-Remaining`	Requests remaining in the current rate limit window

On rate limit (429), the response includes a Retry-After header with seconds until the window resets.

Rate limits

30 requests per minute per API key. Sliding window.

Routing

The gateway scores each request for complexity (cheap/mid/premium) and reads capacity weights from the on-chain CapacityRegistry. Provider selection combines three signals:

Complexity tier — maps to a preferred provider order (e.g. premium prefers Anthropic, cheap prefers Groq).
On-chain capacity — providers with more spare capacity (GDA pool units) rank higher within a tier.
Quality score — recent success rate and latency from the 1-hour metrics window multiply each provider's capacity weight.

The routing.quality hint shifts the complexity tier up or down. The routing.prefer hint doubles a named provider's weight. Experimental filters (maxCost, maxLatency, excludeProviders) trim the candidate set before selection.

Adaptive exploration sends ~5% of requests to a non-preferred provider to discover performance changes. The exploration rate doubles when any provider shows degraded performance (>20% error rate or >5s latency in the 5-minute window).

If the selected provider fails, the gateway falls back to the next available provider. Each completion is recorded on-chain via the CompletionLedger.

SDK integration

typescript

import { route } from '@puraxyz/sdk'

const result = await route({
  apiKey: 'pura_your_key',
  messages: [{ role: 'user', content: 'What is BPE?' }],
})

console.log(result.content)
console.log(result.provider)     // "openai" or "anthropic"
console.log(result.requestId)    // on-chain receipt id

Or use the OpenAI SDK directly:

typescript

import OpenAI from 'openai'

const client = new OpenAI({ baseURL: 'https://api.pura.xyz/v1', apiKey: 'pura_your_key' })
const completion = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'hello' }],
})

CORS

The gateway supports CORS for browser-based clients. Allowed origins, methods, and headers are configured to support standard fetch API usage.