The Pura gateway is an OpenAI-compatible API that routes across four LLM providers (OpenAI, Anthropic, Groq, Gemini). It picks the best model for your task and tracks per-request costs.
Today those costs are estimated gateway costs, not exact upstream provider invoices. The gateway estimates tokens, applies a static per-provider rate card, and exposes that number in headers and reports so you have one consistent figure to bill against.
The shortest useful path looks like this:
"stream": false if you want one JSON object.curl -X POST https://api.pura.xyz/api/keys \
-H "Content-Type: application/json" \
-d '{"label":"my-agent"}'Save the key from the response. It starts with pura_.
POST /api/chat streams Server-Sent Events by default. Plain curl will print each data: frame as it arrives. Use -N so curl does not buffer the stream.
curl -N https://api.pura.xyz/api/chat \
-H "Authorization: Bearer $PURA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}]}'You should see a sequence of data: {...} chunks followed by data: [DONE]. That is the normal success path for a streaming response.
Pura picks the model automatically. Simple questions go to Groq or Gemini (fastest, cheapest). Complex reasoning goes to Anthropic or OpenAI (highest quality).
If you want one completion object instead of streaming SSE, set "stream": false.
curl https://api.pura.xyz/api/chat \
-H "Authorization: Bearer $PURA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"stream":false}'This mode is easier to script when you want to pipe the result into jq or another JSON consumer.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.pura.xyz/v1",
apiKey: process.env.PURA_API_KEY,
});
const res = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Explain backpressure routing." }],
});Every response includes routing metadata:
| Header | What it tells you |
|---|---|
X-Pura-Provider | Which provider handled the request |
X-Pura-Model | Specific model used |
X-Pura-Cost | Estimated cost in USD |
X-Pura-Tier | Complexity tier (cheap / mid / premium) |
X-Pura-Budget-Remaining | Daily budget left |
X-Pura-Quality | Quality bias applied (if routing.quality was set) |
X-Pura-Explored | Whether the router explored a non-preferred provider |
If you charge your own customers today, use X-Pura-Cost and the report endpoint as your canonical usage number. It is the gateway's estimate of what that request cost to route, not a provider-native invoice line item.
Each request gets scored on complexity:
On-chain capacity weights (GDA pool units on Base Sepolia) break ties between providers in the same tier. Quality scores from recent success rates and latency further weight the selection.
Pass a routing object to influence provider selection without forcing a specific model:
curl https://api.pura.xyz/api/chat \
-H "Authorization: Bearer $PURA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Analyze this code"}],"routing":{"quality":"high"}}'quality: "high" bumps the tier up. A mid-complexity task gets routed to premium-tier models. quality: "low" does the reverse and pushes toward cheaper models. prefer: "anthropic" soft-boosts a provider's selection weight without locking to it.
curl https://api.pura.xyz/api/chat \
-H "Authorization: Bearer $PURA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'Supported model prefixes: gpt* / o* → OpenAI, claude* → Anthropic, llama* / mixtral* / gemma* → Groq, gemini* → Gemini.
curl https://api.pura.xyz/api/report \
-H "Authorization: Bearer $PURA_API_KEY"Returns a JSON breakdown: total spend, per-model costs, request count, average cost per request over the past 24 hours.
Those numbers use the same estimate model as X-Pura-Cost, so headers and reports stay aligned.
Each key has a daily spend cap (default $10). When the budget runs out, the gateway returns HTTP 402 with a budget_exhausted error code. The budget resets at midnight UTC.
Pass your own provider API key to use your account directly:
curl https://api.pura.xyz/api/chat \
-H "Authorization: Bearer $PURA_API_KEY" \
-H "X-Provider-Key: sk-your-openai-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'With BYOK, Pura still routes and tracks costs, but inference charges go to your provider account.
The first 5,000 requests are free. After that, the gateway returns HTTP 402 with a Lightning funding invoice. You can also create one directly:
# Create a funding invoice
curl -X POST https://api.pura.xyz/api/wallet/fund \
-H "Authorization: Bearer $PURA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"amount": 10000}'
# Response fields include:
# - paymentRequest: raw BOLT11 string
# - invoiceUrl: hosted invoice page with QR code
# - statusUrl: authenticated status endpointPay the BOLT11 invoice in your wallet, or open invoiceUrl on mobile and let the wallet handle the lightning: deeplink.
# Check invoice status
curl "https://api.pura.xyz/api/wallet/status?invoiceId=INV_ID" \
-H "Authorization: Bearer $PURA_API_KEY"Once the invoice settles, the gateway credits your sat balance and starts debiting request costs from that balance.
# Check balance
curl https://api.pura.xyz/api/wallet/balance \
-H "Authorization: Bearer $PURA_API_KEY"If you use OpenClaw, install the Pura skill instead of configuring the API manually. See OpenClaw integration.
Check real-time provider availability at pura.xyz/status or hit the API directly:
curl https://api.pura.xyz/api/status