Get started: gateway

The Pura gateway is an OpenAI-compatible API that routes across four LLM providers (OpenAI, Anthropic, Groq, Gemini). It picks the best model for your task and tracks per-request costs.

Today those costs are estimated gateway costs, not exact upstream provider invoices. The gateway estimates tokens, applies a static per-provider rate card, and exposes that number in headers and reports so you have one consistent figure to bill against.

Two API surfaces: /v1/chat/completions is the OpenAI-compatible endpoint — use this with the OpenAI SDK or any tool that expects an OpenAI-style base URL. /api/chat is the native Pura endpoint used in the curl examples below. Both are live; the /v1 route is the better default if your code also talks to OpenAI directly.

The shortest useful path looks like this:

Create a key.
Send a streaming request and watch the SSE frames.
Send the same request with "stream": false if you want one JSON object.
Keep using the key for free until the gateway asks you to fund it with Lightning.

Get an API key

shell

curl -X POST https://api.pura.xyz/api/keys \
  -H "Content-Type: application/json" \
  -d '{"label":"my-agent"}'

Save the key from the response. It starts with pura_.

Stream a request

POST /api/chat streams Server-Sent Events by default. Plain curl will print each data: frame as it arrives. Use -N so curl does not buffer the stream.

shell

curl -N https://api.pura.xyz/api/chat \
  -H "Authorization: Bearer $PURA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}]}'

You should see a sequence of data: {...} chunks followed by data: [DONE]. That is the normal success path for a streaming response.

Pura picks the model automatically. Simple questions go to Groq or Gemini (fastest, cheapest). Complex reasoning goes to Anthropic or OpenAI (highest quality).

Get one JSON response

If you want one completion object instead of streaming SSE, set "stream": false.

shell

curl https://api.pura.xyz/api/chat \
  -H "Authorization: Bearer $PURA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"stream":false}'

This mode is easier to script when you want to pipe the result into jq or another JSON consumer.

Use with the OpenAI SDK

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.pura.xyz/v1",
  apiKey: process.env.PURA_API_KEY,
});

const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Explain backpressure routing." }],
});

Response headers

Every response includes routing metadata:

Header	What it tells you
`X-Pura-Provider`	Which provider handled the request
`X-Pura-Model`	Specific model used
`X-Pura-Cost`	Estimated cost in USD
`X-Pura-Tier`	Complexity tier (cheap / mid / premium)
`X-Pura-Budget-Remaining`	Daily budget left
`X-Pura-Quality`	Quality bias applied (if routing.quality was set)
`X-Pura-Explored`	Whether the router explored a non-preferred provider

If you charge your own customers today, use X-Pura-Cost and the report endpoint as your canonical usage number. It is the gateway's estimate of what that request cost to route, not a provider-native invoice line item.

How routing works

Each request gets scored on complexity:

cheap — Short messages, no code blocks, simple questions. Routes to Groq (llama-3.3-70b) or Gemini (gemini-2.0-flash).
mid — Moderate length, some code, multi-turn conversations. Routes to OpenAI (gpt-4o) or Gemini.
premium — Long system prompts, code-heavy, reasoning triggers like "analyze" or "debug". Routes to Anthropic (claude-sonnet-4-20250514) or OpenAI.

On-chain capacity weights (GDA pool units on Base Sepolia) break ties between providers in the same tier. Quality scores from recent success rates and latency further weight the selection.

Routing hints

Pass a routing object to influence provider selection without forcing a specific model:

shell

curl https://api.pura.xyz/api/chat \
  -H "Authorization: Bearer $PURA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Analyze this code"}],"routing":{"quality":"high"}}'

quality: "high" bumps the tier up. A mid-complexity task gets routed to premium-tier models. quality: "low" does the reverse and pushes toward cheaper models. prefer: "anthropic" soft-boosts a provider's selection weight without locking to it.

Force a specific model

shell

curl https://api.pura.xyz/api/chat \
  -H "Authorization: Bearer $PURA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Supported model prefixes: gpt* / o* → OpenAI, claude* → Anthropic, llama* / mixtral* / gemma* → Groq, gemini* → Gemini.

Cost reports

shell

curl https://api.pura.xyz/api/report \
  -H "Authorization: Bearer $PURA_API_KEY"

Returns a JSON breakdown: total spend, per-model costs, request count, average cost per request over the past 24 hours.

Those numbers use the same estimate model as X-Pura-Cost, so headers and reports stay aligned.

Budget enforcement

Each key has a daily spend cap (default $10). When the budget runs out, the gateway returns HTTP 402 with a budget_exhausted error code. The budget resets at midnight UTC.

BYOK (bring your own key)

Pass your own provider API key to use your account directly:

shell

curl https://api.pura.xyz/api/chat \
  -H "Authorization: Bearer $PURA_API_KEY" \
  -H "X-Provider-Key: sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

With BYOK, Pura still routes and tracks costs, but inference charges go to your provider account.

Lightning funding

The first 5,000 requests are free. After that, the gateway returns HTTP 402 with a Lightning funding invoice. You can also create one directly:

shell

# Create a funding invoice
curl -X POST https://api.pura.xyz/api/wallet/fund \
  -H "Authorization: Bearer $PURA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"amount": 10000}'

# Response fields include:
# - paymentRequest: raw BOLT11 string
# - invoiceUrl: hosted invoice page with QR code
# - statusUrl: authenticated status endpoint

Pay the BOLT11 invoice in your wallet, or open invoiceUrl on mobile and let the wallet handle the lightning: deeplink.

shell

# Check invoice status
curl "https://api.pura.xyz/api/wallet/status?invoiceId=INV_ID" \
  -H "Authorization: Bearer $PURA_API_KEY"

Once the invoice settles, the gateway credits your sat balance and starts debiting request costs from that balance.

shell

# Check balance
curl https://api.pura.xyz/api/wallet/balance \
  -H "Authorization: Bearer $PURA_API_KEY"

OpenClaw skill

If you use OpenClaw, install the Pura skill instead of configuring the API manually. See OpenClaw integration.

Provider status

Check real-time provider availability at pura.xyz/status or hit the API directly:

shell

curl https://api.pura.xyz/api/status