Back to blog

3 May 2026

AI Agent Frameworks: Connect to an OpenAI-Compatible API

A practical guide to AI agent frameworks, OpenAI-compatible model APIs, direct model selection, Smart Select routing, long context, and token spend control.

A lot of agent projects stall at the same point. The framework works locally, tool calls fire, the prompt chain looks sensible, and the demo feels solid. Then production starts asking harder questions: which model should handle each step, what happens when latency spikes, how much context is being resent on every turn, and who notices when an agent starts looping through tools and burning budget.

That last mile matters more than most framework comparisons admit. AI agent frameworks don't run in a vacuum. They sit on top of model APIs, and the quality of that connection often decides whether an agent is usable in production or just impressive in a notebook. As adoption grows, the engineering problem changes. It isn't just about orchestration any more. It's about reliability, cost control, security, and the mechanics of getting requests to the right model endpoint.

Teams looking for practical deployment patterns can find more implementation notes on the Select blog.

Table of Contents

Beyond the Hype: What AI Agent Frameworks Really Do

An agent framework isn't the agent's intelligence. It's the orchestration layer around the model. It manages prompts, session state, tool definitions, retries, memory, and the control flow that decides what happens after the model responds.

That's why the "best" framework usually depends on the shape of the job. A coding agent needs good tool invocation, file-aware context handling, and guardrails around repeated edits. A support workflow agent needs deterministic hand-offs, response constraints, and auditability. A research assistant needs retrieval and long-context discipline more than fancy multi-agent theatre.

Different frameworks optimise for different failure modes

Some frameworks help organise a single capable agent. Others are built for groups of specialised agents that coordinate on a task. Either way, the framework does four practical jobs:

  • State management: It keeps track of what the agent has already seen, decided, and executed.
  • Tool execution: It turns model output into calls to shell commands, databases, APIs, or internal services.
  • Control flow: It decides whether to continue, retry, pause for approval, or terminate.
  • Observability hooks: It gives engineers traces, logs, and enough metadata to debug behaviour that would otherwise look opaque.

Practical rule: If an agent can't be inspected step by step, it isn't production-ready. It's just harder to debug.

The framework layer matters, but it doesn't solve the harder production problem on its own. Once the agent leaves a local environment, API behaviour starts shaping outcomes. Model availability, context limits, streaming quality, tool-call formatting, and token pricing all feed back into the agent's reliability.

That is where many comparisons of ai agent frameworks stop too early. The orchestration logic can be elegant and still fail operationally if the model connection is brittle, expensive, or hard to swap.

The Standard Connection: An Agent Framework and an API

An abstract illustration depicting connected data spheres linking to a complex, porous blue structure representing API integration.

Under the hood, most modern agent stacks converge on the same pattern. The framework prepares a request, sends it to a model endpoint, receives tokens or tool calls back, and then decides what to do next. The names differ a bit between libraries, but the moving parts are familiar.

The useful thing for engineers is that this layer is usually configurable. That means the model provider isn't welded into the framework. In many cases, changing the endpoint is a configuration change, not a rewrite.

Implementation references for this pattern are easier to follow when the provider documents an OpenAI-style interface clearly. The Select documentation is one example of the kind of endpoint shape engineers typically look for.

What every framework eventually configures

Whether the code lives inside Hermes Agent, OpenCode, OpenClaw, LangChain, LlamaIndex, CrewAI, or AutoGen, these settings keep showing up:

  • baseURL
    This tells the client where to send requests. It may point at a direct model provider, a hosted gateway, or an inference router.

  • apiKey
    This authenticates the request. In production, it should be scoped, rotated, and stored outside source control.

  • model
    This picks the actual model, or a router alias that chooses one for the workload.

  • tool calling
    This gives the model a structured list of available actions. The framework then validates and executes those actions.

  • streaming
    This controls whether tokens arrive incrementally. For coding agents and terminal tools, streaming usually improves usability because operators can see reasoning and edits unfold in real time.

A generic OpenAI-compatible client pattern

A minimal TypeScript configuration usually looks like this:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.select.ax/v1",
  apiKey: process.env.SELECT_API_KEY,
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "system", content: "You are a coding assistant." },
    { role: "user", content: "Review this patch and suggest a safer refactor." }
  ],
  stream: true,
  tools: [
    {
      type: "function",
      function: {
        name: "read_file",
        description: "Read a file from the workspace",
        parameters: {
          type: "object",
          properties: {
            path: { type: "string" }
          },
          required: ["path"]
        }
      }
    }
  ]
});

The Python version follows the same pattern:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.select.ax/v1",
    api_key=os.environ["SELECT_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a coding assistant."},
        {"role": "user", "content": "Summarise the bug and propose a fix."},
    ],
    stream=True,
)

A lot of framework abstraction boils down to constructing and repeating this call safely. Once that's clear, the provider layer becomes easier to reason about. Engineers can inspect where latency comes from, where retries happen, and which model is paying for the agent's "thinking".

A Practical Tour of AI Agent Frameworks

A comparative overview chart illustrating various AI agent frameworks including LangChain, AutoGen, CrewAI, LlamaIndex, ReAct, and Semantic Kernel.

Not all AI agent frameworks are trying to solve the same problem. Some focus on developer control. Some focus on retrieval. Some assume a multi-agent architecture from the start. Choosing one is less about headline popularity and more about how much orchestration logic the workflow really needs.

Different frameworks optimise for different failure modes

LangChain is still common when a team wants modular building blocks. It works well when the application needs prompts, tools, retrieval, and custom logic stitched together in code. It can also become abstraction-heavy if the workflow is simple.

LangGraph is better suited to stateful, non-linear workflows where agents revisit steps, branch, and maintain persistent context. For complex or regulated workflows, that architecture can pay off because the control flow is explicit rather than hidden inside one long prompt.

CrewAI fits teams that want explicit role-based collaboration. Planner, researcher, reviewer, and executor patterns are easier to express when the framework already thinks in teams of agents. It suits workflows where responsibilities are easier to debug when split across named roles.

AutoGen is strong when the interaction model itself is conversational and multi-agent. Agents can exchange structured messages, coordinate over longer runs, and support more event-driven task decomposition. That architecture maps naturally to collaborative coding, experimentation, and task routing.

LlamaIndex tends to be a better fit when retrieval is the centre of the system rather than a side feature. If the hardest problem is grounding the agent in internal documents, data sources, or indexed corpora, retrieval-first design matters.

Hermes Agent, OpenCode, OpenClaw, and Claude Code-style tools sit closer to the operator. They are often used for coding workflows, local file manipulation, terminal actions, or developer copilots. Their practical value comes from how they wrap model calls, tool execution, and editing loops, not from abstract orchestration theory.

Good framework selection starts with workflow shape, not brand loyalty.

Conceptual Comparison of AI Agent Frameworks

Framework Core Concept Best For Orchestration Style
LangChain Modular LLM application components Custom code-first agents and tool pipelines Chain and component composition
LangGraph Stateful graph execution Cyclical, multi-step, persistent workflows Graph-based orchestration
CrewAI Role-based agent collaboration Team-style agent workflows Delegated multi-agent coordination
AutoGen Conversational multi-agent systems Event-driven collaboration and research-style agents Message-passing agents
LlamaIndex Retrieval-centred agent design Data querying and grounded assistants RAG-led orchestration
Hermes Agent / OpenCode / OpenClaw Operator-facing coding workflows Terminal agents and coding assistants Tool-driven interactive loops

The pattern across all of them is consistent. The framework controls behaviour. The API connection controls the quality, price, and reliability of that behaviour.

Why OpenAI-Compatible Endpoints Are Your Best Friend

A gold key sits in front of a green circular pattern on a black electronic device.

OpenAI compatibility isn't just a convenience feature. It's an architectural escape hatch. If a framework can talk to any endpoint that follows the same request and response shape, the agent logic becomes less dependent on one provider's SDK or one provider's operational quirks.

That matters because provider conditions change. Availability shifts. A model that looked sensible for testing may become too expensive for sustained agent loops. Another may handle tool calls more reliably. A third may perform better on long context. Compatibility keeps those choices open.

Compatibility reduces switching cost

A clean OpenAI-compatible layer simplifies at least three things:

  1. Provider changes
    Switching from one endpoint to another often means updating baseURL, apiKey, and model, while leaving the surrounding framework code intact.

  2. Framework changes
    If the model interface is standardised, moving from one orchestration layer to another is easier because the provider integration doesn't have to be redesigned from scratch.

  3. Operational fallback
    Teams can route around outages or performance issues without rebuilding their agent stack around a bespoke API.

The demand for that flexibility is already visible in real agent deployments. Teams want to know which model handled a request, why that model was chosen, what it cost, and whether a different route would have been faster or more reliable.

A router can simplify the provider layer

An inference router adds another useful abstraction. Instead of integrating with multiple model APIs directly, the framework talks to one OpenAI-compatible endpoint that exposes a curated model catalog. That gives teams one auth pattern, one request schema, and one place to reason about routing decisions.

This is especially useful for teams running coding agents or workflow agents across mixed tasks. One request may need fast low-cost inference. Another may need stronger reasoning or longer context. A router can absorb some of that complexity while keeping the framework code stable.

Standardising the API layer first usually buys more long-term flexibility than arguing about frameworks first.

Compatibility also makes testing more honest. Teams can compare direct model pinning against routed requests using the same framework and nearly the same code path. That makes trade-offs visible. It becomes easier to see whether a problem comes from prompts, orchestration logic, tool definitions, or the model/provider choice itself.

The Long-Context Challenge: Why Coding Agents Burn Tokens

Coding agents consume tokens in ways that surprise teams the first time they inspect a real trace. The expensive part often isn't the final answer. It's everything that happens before the answer becomes stable enough to trust.

Why coding sessions expand so fast

A coding agent rarely sends one prompt and stops. It reads files, inspects diffs, re-reads prior outputs, asks for tool results, receives tool results, revises a plan, and then sends another completion request with more accumulated context. Each turn can include large chunks of previous conversation plus tool outputs.

That creates compounding cost in a few common ways:

  • Workspace reads: Source files, configs, logs, test output, and stack traces all get stuffed into context.
  • Iterative refinement: The agent keeps prior reasoning and draft edits available so it can keep working coherently.
  • Tool chatter: Structured tool calls are efficient compared with free text, but the responses still add tokens.
  • Retry loops: A failed patch or malformed tool call often triggers another cycle with even more context attached.

Long-context models can help, but they don't make the problem disappear. They make it easier to send larger working sets. If the agent isn't selective, it becomes expensive at a larger scale.

Cost bugs often start as control bugs

Runaway spend often comes from weak control logic rather than from one bad model choice. Agents that can retry indefinitely, re-open the same files repeatedly, or call broad tools without tight constraints will keep generating token-heavy loops.

That isn't only a budget issue. The same lack of guardrails that causes overspending can also produce unsafe actions. Tool access, approval gates, and audit logs matter because agent mistakes are not limited to text output once the system can edit files, call APIs, or run commands.

A practical control set usually includes:

  • Bounded iterations: Cap retries and self-repair attempts.
  • Scoped tools: Limit which files, commands, or services the agent may touch.
  • Context trimming: Re-send only what the next step needs.
  • Approval gates: Require confirmation before destructive edits or high-impact operations.

Expensive agents are often undisciplined agents.

Practical Control Direct Selection vs Smart Routing

A 3D render showing particles passing through a metal structure representing smart routing of digital data.

Once the provider layer is configurable, there are two practical ways to control agent behaviour at inference time. The first is to pin a model deliberately. The second is to hand model choice to a router.

When to pin a model directly

Direct selection is the better default when the workload is predictable. If a coding assistant performs a bounded class of edits, or a workflow agent handles a stable internal process, pinning the model keeps performance easier to reason about.

A generic config looks like this:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.select.ax/v1",
  apiKey: process.env.SELECT_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "system", content: "You are a careful software engineering assistant." },
    { role: "user", content: "Refactor this function without changing behaviour." }
  ],
  stream: true
});

Pinning a model is useful when teams want:

  • Stable behaviour: Easier prompt tuning and regression testing.
  • Clear cost expectations: One model profile for one class of work.
  • Targeted evaluation: Cleaner comparisons during agent tuning.

When smart routing makes more sense

Smart routing is more useful when workloads vary a lot from one request to the next. A router can inspect the request and choose a model based on fit, availability, and workload. It shouldn't be treated as magic, and it shouldn't be assumed to always pick the cheapest option. The value is operational flexibility, not blind optimisation.

A generic alternative looks like this:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.select.ax/v1",
    api_key=os.environ["SELECT_API_KEY"],
)

response = client.chat.completions.create(
    model="smart-select",
    messages=[
        {"role": "system", "content": "You are an agent that analyses code and proposes precise changes."},
        {"role": "user", "content": "Review these files and suggest the safest migration path."},
    ],
    stream=True,
)

Direct selection and smart routing solve different problems.

  • Pin a model when repeatability matters most.
  • Route dynamically when task complexity and provider conditions change frequently.
  • Test both paths for real agent traces, not toy prompts.

The primary benefit is keeping both options available behind the same endpoint so teams can move between them without rewriting the framework integration.

Get Started Connecting Your First AI Agent

The quickest useful setup is usually the least ambitious one. Pick one framework already in use, connect it to one OpenAI-compatible endpoint, and run a narrow workflow with visible traces. A coding review task, internal documentation lookup, or patch explanation flow is enough to expose underlying issues.

Indie hackers usually need three things first: pay-as-you-go access, a straightforward model name, and no provider sprawl. Platform teams care more about usage visibility, shared conventions, and reducing random framework-specific integrations. Security-conscious teams need tighter controls around model access, routing policy, and approval before sensitive actions.

A simple rollout path looks like this:

  1. Start with a direct model choice for a known task.
  2. Inspect token usage and tool traces before increasing autonomy.
  3. Add routing only where workload variation justifies it.
  4. Limit tools and retries early so bad loops stay cheap and obvious.
  5. Expand gradually from a single task to a small internal workflow.

For teams that want a concrete implementation example rather than another framework comparison, the agent integration example from Select is the kind of reference that helps shorten setup time.

The best production habit is simple. Treat the model connection as part of the system design, not as a detail hidden behind the framework. That's where latency, spend, failover, and a lot of agent reliability are decided.


Realtime Comms Ltd offers Select, an agent-focused API with a curated model catalog, an OpenAI-compatible endpoint, direct model selection, Smart Select routing, usage visibility, and transparent pay-as-you-go pricing. The simplest way to evaluate it is to connect one existing agent with a small credit pack and test a real workflow before committing to anything larger.