Skip to main content
OpenAI SDK compatible · 35 models · <100ms routing

Task-aware routing
for LLM agents

Your agent can't use Claude for every step. ModelPilot classifies each request (planning, coding, summarizing) and routes to the right model.

Drop-in replacement: change baseURL, keep your OpenAI code.

35
Models available
<100ms
Routing latency
2 lines
To integrate
Explore

How it
works

Each request is classified by task type, then routed to a model optimized for that specific task. No manual model selection.

Task classification

Prompts are classified into task types: code generation, planning, summarization, extraction, creative writing, and more. Each type routes to specialized models.

Model matching

Code tasks → DeepSeek, Codestral. Writing → Claude. Reasoning → o1, GPT-4. Simple tasks → GPT-4o-mini, Haiku. Based on benchmark data, not vibes.

Automatic fallbacks

If a model fails or returns low-quality output, the request is automatically retried with a stronger model. Your agent loop doesn't break.

OpenAI SDK compatible

Same API as OpenAI. Works with LangChain, CrewAI, AutoGen, Vercel AI SDK, or any OpenAI client. Change two lines of config.

Integration
in 2 minutes

1

Create a router

Configure optimization weights (cost, latency, quality) or use our defaults. Get a router ID and API key.

2

Change your baseURL

Point your OpenAI client to modelpilot.co/api/router/{id}. That's it.

3

Requests are classified and routed

Each request is analyzed, matched to the best model, and executed. Check the dashboard for routing decisions.

Works with any OpenAI client

If it uses the OpenAI SDK, it works with ModelPilot. No SDK changes, no new dependencies, no vendor lock-in.

LangChain, CrewAI, AutoGen, Vercel AI SDK
Streaming, function calling, tool use
35 models from 5 providers
Automatic retries and fallbacks
Before — Standard OpenAI
typescript
import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.openai.com/v1"
})
After — ModelPilot (change 2 lines)
typescript
import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.MODELPILOT_API_KEY,
  baseURL: "https://modelpilot.co/api/router/{routerId}"
})

Router presets

Configure optimization weights or use a preset. Each router balances cost, latency, and quality differently.

Quality-first

Prefer stronger models

QualityMaximum
CostStandard
SpeedOptimized

Use case: Production apps where output quality matters more than cost

Recommended

Balanced

Default weights

QualityHigh
CostOptimized
SpeedFast

Use case: General purpose agents, chatbots, most applications

Cost-optimized

Minimize spend

CostMinimized
LatencyFast
QualityAcceptable

Use case: High-volume batch processing, internal tools, dev/test

Fallbacks &
retries

When a model returns an error or low-quality output, ModelPilot automatically retries with a different model. Your agent loop keeps running.

1

Provider failover

If OpenAI is down, route to Anthropic. If rate limited, try another provider. Configurable fallback chains.

2

Error detection

Detects 4xx/5xx errors, timeouts, malformed responses, and rate limits. Triggers automatic retry logic.

3

Model escalation

If a cheap model fails, retry with a stronger one. GPT-5-mini fails? Try Claude Sonnet. Still failing? GPT-5.

Handles edge cases

Rate limits

Timeouts

Provider outages

Malformed JSON

Raw Logs
Unstructured traffic data
Clustering Engine
Pattern recognition & analysis
Custom Router
Domain-specific model selection

Custom
response memories

Train a custom router on your historical request logs. The classifier learns which models work best for your specific prompts and use cases.

Upload CSV/JSON logs
K-means clustering on embeddings
Transfer learning from global router

Static prompt
optimization

Runtime optimizers add 2-3 seconds of latency per request. ModelPilot optimizes your prompt templates statically—zero added latency at runtime.

Zero-latency execution

Optimizations are applied to your templates at deploy time. No intermediate LLM calls in the hot path.

Model-specific formatting

Automatically formats prompts for the target model (e.g., XML tags for Claude, structured markers for GPT-4).

Static few-shot injection

We analyze your historical logs to find the best few-shot examples and bake them into your prompt templates.

Latency Comparison
Standard Optimization2,400ms

Requires LLM roundtrip to rewrite prompt

ModelPilot Static0ms

Pre-computed at deploy time

Automatic formatting
<claude_formatting>
  Use XML tags for clear separation...
</claude_formatting>

Carbon tracking
per request

Every API response includes estimated CO₂e based on model size, architecture (dense vs MoE), and provider region. Export reports for ESG compliance.

Per-request metrics

CO₂e estimates in response headers and dashboard analytics

Carbon-aware routing

Optionally weight routing decisions by environmental impact

ESG reports

Export monthly carbon reports for compliance documentation

View methodology

Start routing
in 2 minutes

Create a router, change your baseURL, done. Free tier includes $5 in credits. No credit card required.

npm install modelpilot • OpenAI SDK compatible