12 May 2026
What Is Pay As You Go? AI API Pricing Explained
A practical guide to pay-as-you-go AI API pricing: tokens, model choice, routing, credit packs, spend visibility, and how Select keeps usage transparent.
Pay-as-you-go (PAYG) is a pricing model where you pay only for the resources you consume, with no upfront fees or fixed monthly subscriptions. In the UK, that model proved it could scale long before AI APIs existed, becoming a familiar way to buy variable services long before cloud and AI APIs made metered usage normal.
That matters because plenty of engineering teams are in the same place today with AI. They have a feature in production, an agent that runs unpredictably, a long-context workflow that spikes token use, and a monthly bill that doesn't line up neatly with seat counts or flat plans. A fixed subscription feels safe until half the capacity sits idle. Pure PAYG feels efficient until an agent loop burns through usage faster than expected.
For AI APIs, the core question isn't just what is pay as you go. It's how metered pricing behaves when prompts vary, outputs expand, models change, and routing decisions affect both cost and reliability. Select is built around that problem: one OpenAI-compatible endpoint, dollar credit packs, direct model choice or Smart Select routing, and usage visibility so teams can see what each request costs.
Table of Contents
- What is Pay As You Go and Why Does It Matter for AI
- How Pay As You Go Works for APIs and AI Models
- PAYG vs Subscription vs Prepaid Models
- Benefits and Drawbacks for Developer Teams
- Practical PAYG Scenarios and Code Examples
- Implementing Transparent PAYG with Select
- Conclusion When to Choose Pay As You Go
What is Pay As You Go and Why Does It Matter for AI
A developer usually notices PAYG when a fixed plan starts feeling wrong. The team bought a monthly API subscription for comfort, but usage is irregular. Staging is quiet most days, batch jobs run in bursts, and a coding agent suddenly needs far more context than the original estimate allowed.
What is pay as you go? It's a usage-based model where the bill tracks actual consumption instead of a reserved monthly allowance. In AI, that usually means paying for prompt and response volume, rather than for a seat, a tier, or a bundle that may or may not match real demand.
Why engineers care about it
AI workloads are rarely flat. A support bot may stay quiet overnight and then spike during launches. A document analysis tool may process nothing for hours and then receive several long files in one go. An agent framework may make a short decision call in one step and a large planning call in the next.
That variability is exactly where PAYG makes sense. It matches billing to the shape of the workload.
Practical rule: PAYG fits best when usage is uneven, experimentation is frequent, or teams don't yet know the steady-state cost profile of a feature.
Why the model is already proven
PAYG isn't a new pricing experiment invented for AI startups. In the UK telecom market, it became mainstream decades ago. By 2000, 52% of UK mobile subscriptions were PAYG, rising to 65% by 2005, according to Ofcom's UK mobile market reporting. That history matters because it shows the model works at national scale when customers value flexibility and don't want long commitments.
The same logic carries into AI APIs. Developers want to try models, swap routing logic, and control spend without renegotiating a contract every time traffic changes.
What often gets missed
The marketing version of PAYG is simple. “Pay only for what you use.” The engineering version is more demanding. Teams need metering they can trust, good visibility into token consumption, and clear ways to stop a bug from turning into a billing surprise.
That's where AI-specific PAYG becomes interesting. The model isn't just about lower entry cost. It's about whether the platform gives enough operational control to make variable pricing safe.
How Pay As You Go Works for APIs and AI Models
A request hits your API. The gateway authenticates it, selects a model, sends the payload, records usage, and writes that usage into billing. In a PAYG system, every one of those steps affects what the team pays.
With AI APIs, the billable unit is usually tokens, not requests.

Tokens are the meter
In token-priced systems, billing is tied to the text sent to the model and the text returned from it. A useful rule of thumb is that 1 token is roughly 4 characters in many text metering contexts, as described in DealHub's PAYG overview.
A chat completion isn't one unit of work because a single request can include:
- System instructions that set behavior
- User content such as prompts, transcripts, or document chunks
- Tool results added by the agent between model calls
- Model output, which can be short, structured, or unexpectedly long
Each part adds to consumption. For agent workflows, the expensive part is often not the final answer. It is the accumulated context passed through repeated turns.
What gets counted in practice
A typical AI billable request includes:
- Input tokens from the prompt payload
- Output tokens from the response
- Additional model calls triggered by retries, tool loops, guardrails, or fallback routing
That third category is where teams get surprised. A feature may look like one user action in the product, but under the hood it can trigger several model interactions. If the first model times out and traffic fails over to a second model, both routing logic and retry behavior can change the final cost.
A short classification call stays cheap because input and output are both small. A long-context summarization job costs more because the prompt carries a large document, and the response may still be substantial.
Token totals are useful. Token totals broken down by request step are what engineers need to control spend.
What makes the billing accurate
Accurate PAYG depends on real metering at the API layer. The gateway receives the request, checks the key, forwards traffic to the selected model, and emits usage events tied to that request. The metering system turns those events into billable records, usually with details such as model name, input tokens, output tokens, timestamp, and account ID.
That sounds straightforward until agent behavior gets involved. Tool calling, streaming, retries, cached context, and model fallback all make usage accounting harder. If the platform only exposes a monthly total, the finance team gets a bill, but the engineering team still cannot explain which feature or route generated it.
Reliable PAYG systems expose at least three things:
- Real-time or near-real-time usage data
- Per-model and per-request token counts
- Billing records that match the requests your code made
Without that, cost anomalies are hard to debug and harder to prevent.
Why routing changes the equation
Modern AI applications rarely stick to one model. Teams route simple extraction to a cheaper model, reserve larger models for reasoning-heavy tasks, and keep a fallback path for reliability. Under PAYG, that routing logic becomes part of the pricing model.
A router that sends every request to the strongest model will usually increase quality and increase cost. A router that aggressively optimizes for price can reduce spend but raise latency, lower answer quality, or create more retries. Good PAYG operations require teams to evaluate all three together: token cost, task success rate, and reliability.
For AI teams, PAYG works well only when usage, routing, and billing are visible in the same system. Otherwise, variable pricing stays variable, but it does not stay controlled.
PAYG vs Subscription vs Prepaid Models
A lot of billing confusion comes from treating three different models as if they were interchangeable. They aren't. They solve different problems.
A subscription trades flexibility for predictability. A prepaid model introduces a budget boundary by making teams load credits in advance. PAYG tracks actual use and charges after consumption, which is usually the most operationally accurate but also the least naturally predictable.
Pricing Model Comparison
| Criterion | Pay-As-You-Go (PAYG) | Subscription | Prepaid |
|---|---|---|---|
| Cost structure | Usage is billed based on actual consumption | Fixed recurring fee for access or allowance | Credits are purchased upfront and spent down with usage |
| Budget predictability | Lower unless teams add controls | High if usage stays within plan assumptions | Higher than postpaid PAYG because spend is bounded by loaded balance |
| Scalability | Strong for bursty and experimental workloads | Good when usage is stable and known | Good for controlled rollout and capped experimentation |
| Risk of wasted spend | Low when traffic is irregular | Higher if purchased capacity sits idle | Medium, depending on whether credits are fully used |
| Ideal use case | Agents, prototyping, routing across models, uneven demand | Stable production workloads with predictable monthly patterns | Teams that want PAYG economics with tighter budget discipline |
Where PAYG is better
PAYG is a strong fit when workload intensity changes from day to day. That includes internal tools, background jobs, batch inference, and agent systems that make a variable number of calls per task. A team pays for the inference it used, not for the capacity it hoped to need.
That's especially useful early in a product's life. The team doesn't yet know which model will stick, how much context users will send, or whether the workload will stay bursty.
Where subscriptions still win
Subscriptions are often better when demand is steady and procurement prefers a known monthly line item. If the application runs continuously with similar request sizes, the predictability can outweigh the inefficiency of paying for unused headroom.
This is the part many PAYG explainers skip. Cost efficiency and budgeting efficiency are not the same thing.
The hidden cost of PAYG
The biggest trade-off in PAYG isn't price per request. It's forecasting. The appeal is obvious, but finance and engineering still need a way to estimate month-end spend before the invoice lands. That's why generic “pay only for what you use” messaging is incomplete. Stripe's discussion of pay-as-you-go pricing highlights that forecasting is the operational challenge, and notes that 58% of SaaS businesses had adopted usage-based pricing by 2023.
Teams that use PAYG well usually add process around it:
- Set spend limits: Put hard caps on projects, environments, or API keys.
- Track daily burn: Review usage often enough to catch drift before month end.
- Separate dev from prod: Don't let experiments distort production cost signals.
- Model worst-case prompts: Long-context and tool-heavy flows need explicit budget assumptions.
PAYG works best when billing data is treated like observability data. It needs dashboards, alerts, and ownership.
Why prepaid sits in the middle
Prepaid is often the practical compromise. It keeps usage-based pricing but removes some of the anxiety of open-ended postpaid billing. Teams still pay according to consumption, but they do it against a finite credit balance.
For a founder, that can simplify cash control. For an engineering lead, it can prevent an unexpected jump from becoming an accounting issue.
Benefits and Drawbacks for Developer Teams
Developer teams usually like PAYG for one simple reason. It maps better to how software gets built. Workloads aren't neat. Environments are inconsistent. Agents don't all follow the same path, and experiments often matter more than steady utilisation.

Where PAYG helps
The biggest benefit is alignment. A team pays when code runs and inference happens. That's useful in at least three common situations:
- Bursty traffic: A product launch, internal batch job, or agent workflow can surge for a short period and then drop back.
- Model experimentation: Teams can test several models without committing to separate fixed plans.
- Early-stage shipping: Startups and internal platform teams can get into production without buying large blocks of capacity upfront.
For AI specifically, PAYG also makes model routing easier to justify. If a simple task can go to a cheaper model and a harder task to a more capable one, the cost difference shows up directly in usage.
Where teams get burned
The downside is that bad code now has a direct cost signature. Retry storms, oversized prompts, tool loops, and poor context trimming all become billing problems as well as technical ones.
That's why “runaway cost” is the right phrase. In token-heavy workflows, a mistake compounds fast. According to Chargebee's PAYG glossary discussion, Ofcom's 2024 telecoms billing reporting found 18% fewer disputes when PAYG models were paired with usage dashboards, and UK Cloud Index data cited there reports 25% spend optimisation for teams using PAYG with dynamic routing.
Operational advice: Don't enable agent autonomy without giving the team a live usage view and a way to shut work down.
What actually works in practice
The fixes are straightforward, but they need to be deliberate.
- Usage dashboards first: Teams need per-key, per-environment, or per-feature visibility. A total account balance isn't enough.
- Budget guards in code: Set maximum prompt sizes, output limits, and retry ceilings.
- Alert on spikes: A sudden jump in token use usually means a prompt bug, a loop, or a traffic anomaly.
- Throttle aggressively on failure: If a provider starts returning errors, clients should avoid multiplying spend through uncontrolled retries. Teams dealing with request pressure and retry handling can use guidance like this explanation of too many requests errors to design safer backoff paths.
A team doesn't need perfect forecasting to use PAYG well. It needs enough visibility to notice abnormal behaviour quickly, and enough control to stop it.
Practical PAYG Scenarios and Code Examples
The easiest way to understand PAYG is to tie it to requests a team might ship. AI pricing gets clearer when each API call has a visible input, an expected output shape, and a bounded operational purpose.

Scenario one bursty agent workflow
Consider a coding agent that reviews pull requests. Most of the day it does nothing. Then a batch of commits lands and it needs to classify files, inspect diffs, and produce comments. That pattern is a poor fit for a large fixed subscription, because idle time dominates.
Here's a TypeScript example using Select's OpenAI-compatible endpoint:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.SELECT_API_KEY,
baseURL: "https://api.select.ax/v1"
});
async function reviewDiff(diff: string) {
const response = await client.chat.completions.create({
model: "smart-select",
messages: [
{
role: "system",
content: "You review code diffs and return concise, actionable comments."
},
{
role: "user",
content: `Review this git diff and flag correctness, security, and maintainability issues:\n\n${diff}`
}
],
max_tokens: 500
});
return response.choices[0]?.message?.content ?? "";
}
What matters in PAYG terms is the shape of the work:
- Short diffs stay cheap
- Large diffs consume more input tokens
- Verbose review prompts increase output tokens
- A loop across many changed files multiplies total usage
A practical team response is to set bounds before shipping:
function trimDiff(diff: string, maxChars = 12000) {
return diff.length > maxChars ? diff.slice(0, maxChars) : diff;
}
This doesn't eliminate cost variation, but it stops one oversized payload from dominating the bill.
Scenario two long-context summarisation
A second common case is document summarisation. The request volume may be low, but each call is large. That's where token-based PAYG becomes visible immediately.
curl https://api.select.ax/v1/chat/completions \
-H "Authorization: Bearer $SELECT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.6-official",
"messages": [
{
"role": "system",
"content": "Summarise the document for an engineering manager. Keep technical risks and action items."
},
{
"role": "user",
"content": "PASTE_LONG_DOCUMENT_TEXT_HERE"
}
],
"max_tokens": 800
}'
In this pattern, the team should think in parts:
- The system prompt has a fixed token cost
- The document body drives most of the input usage
- The summary size controls output usage
- Repeated retries can double or triple real spend if not handled carefully
Long-context work usually fails not because the base request is expensive, but because nobody constrained retries, chunking, or output length.
A simple way to estimate before calling
Even without exact provider-specific pricing in code, teams can estimate usage risk by measuring prompt length before sending. The basic token approximation from earlier helps.
def approx_tokens(text: str) -> int:
return max(1, len(text) // 4)
system_prompt = "Summarise the document for an engineering manager."
document_text = open("report.txt", "r", encoding="utf-8").read()
estimated_input_tokens = approx_tokens(system_prompt) + approx_tokens(document_text)
print({"estimated_input_tokens": estimated_input_tokens})
That estimate isn't perfect, but it's useful enough to drive guardrails such as chunking, previewing, or refusing oversized payloads.
For teams wiring these flows into production, the main requirement is consistency. Keep request construction visible, cap outputs, and test with realistic documents instead of toy prompts. The implementation details for OpenAI-compatible endpoints, request formats, and model calls are easier to verify against the Select documentation, especially when building agent workflows that may later swap providers.
Implementing Transparent PAYG with Select
A practical PAYG setup for AI should solve two problems at once. It should keep billing tied to actual inference, and it should make spend legible enough that engineering and finance can both live with it.
Select takes that route by combining a curated model catalog, an OpenAI-compatible endpoint, and transparent pay-as-you-go pricing. Instead of forcing a team into a monthly subscription, Select uses dollar-based credit packs with no weekly limits and no expiry. Credits are consumed per request from the same balance whether a developer chooses deepseek-v4-flash, kimi-k2.6-official, a TEE model, or smart-select routing. That makes AI spend closer to controlled prepaid consumption than an open-ended postpaid bill.
Why that matters for engineering teams
For developers, the biggest advantage is implementation simplicity. Existing OpenAI-compatible code usually doesn't need a redesign. In many cases, the core change is just the endpoint and model choice.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.SELECT_API_KEY,
baseURL: "https://api.select.ax/v1"
});
const response = await client.chat.completions.create({
model: "smart-select",
messages: [
{ role: "system", content: "Answer as a precise engineering assistant." },
{ role: "user", content: "Explain how to cap token usage in an agent loop." }
]
});
console.log(response.choices[0]?.message?.content);
That matters because pricing models only help if teams can adopt them without rewriting their stack.
Where transparent PAYG is strongest
Transparent PAYG is most useful when a team wants to:
- Switch between direct model selection and routing
- See pricing and availability clearly
- Control budget with finite credits instead of a rolling invoice
- Use one API shape across multiple open and agentic models
Select's Smart Select routing adds another layer. It can analyse a request and route it to a suitable model automatically, while teams that want tighter control can choose a specific model directly. The model catalog and current rates are easiest to review on the Select pricing page and in the dashboard after purchase.
For teams that care about private or regulated workloads, constrained routing and TEE-enabled options also matter. Those decisions are usually less about raw price and more about operational fit, security posture, and confidence in where a request is processed.
Conclusion When to Choose Pay As You Go
PAYG is a strong choice when usage is variable, prompts differ in size, and the team wants billing to follow real inference instead of a fixed allowance. It's especially useful for agent workflows, model experimentation, and long-context tasks that don't run at a steady rate.
A simple decision test works well:
- Choose PAYG when demand is bursty or uncertain
- Choose subscription when usage is stable and a fixed monthly cost matters more than precision
- Choose prepaid when the team wants PAYG mechanics with tighter budget control
What matters most isn't the label. It's whether the billing model matches how the system behaves. For AI APIs, that means token visibility, routing control, and safeguards against runaway usage. If those pieces are in place, pay as you go is usually the most honest way to price inference.
Teams building with open and agentic models can try Select for a practical PAYG setup: one OpenAI-compatible endpoint, curated model access, direct model selection or Smart Select routing, live usage visibility, and clear credit-based pricing without subscriptions.
