Skip to main content
Custom Brain

Response Memories

Train your router on your own evaluation logs and data. Response Memories create personalized routing models that understand your specific use cases and optimize for your unique traffic patterns.

What is a Response Memory?

A Response Memory is a trained routing model specific to your application. When you upload your historical request/response logs, ModelPilot analyzes patterns in your data to learn:

Query Clusters
Which types of prompts your application handles (coding, support, research, creative, etc.) and their distinct characteristics.
Model Performance
Which models perform best for each cluster based on your actual evaluation data—not generic benchmarks.
Routing Rules
Optimized decision boundaries that route incoming requests to the best model in under 20ms.

How It Works

1

Upload Your Data

Export your request logs as CSV or JSON. Each record should include the prompt, the model used, and ideally a quality score or outcome indicator.

json
// Example training data format
{
  "prompt": "Write a Python function to parse JSON",
  "model_used": "openai:gpt-4o",
  "quality_score": 0.95,
  "latency_ms": 1200,
  "category": "coding"  // optional
}
2

Automatic Clustering

ModelPilot embeds your prompts and clusters them into semantic groups. This identifies distinct "task types" in your traffic—even ones you didn't explicitly label.

Example clusters discovered: SQL generation, code review, customer support, documentation writing, data analysis, creative brainstorming

3

Model Performance Analysis

For each cluster, we analyze which models performed best based on your quality scores, latency requirements, and cost constraints.

4

Runtime Artifact Generation

We compile a lightweight classifier that can route new requests in under 20ms. This artifact is deployed to edge nodes for minimal latency.

Data Format

Required Fields
prompt

The input text sent to the model

model_used

The model ID that handled this request (e.g., "openai:gpt-4o")

Optional Fields (Recommended)
quality_score

A 0-1 score indicating response quality (from human eval, automated scoring, or user feedback)

latency_ms

Response time in milliseconds

category

Your own task category label (helps validate clustering)

completion

The model's response (used for deeper analysis)

Limits & Quotas

Custom Brain Plan
$199/month
10 Response Memories
10 Training Runs included
$10 per additional training run
Up to 5,000 records per upload
Enterprise
Custom Pricing
100 Response Memories
Unlimited training runs
Bulk data ingestion APIs
Dedicated support

Privacy & Security

Data Isolation

Each Response Memory is completely isolated. Your data is never mixed with other customers' data.

Encryption

All training data is encrypted at rest (AES-256) and in transit (TLS 1.3).

Data Retention

Raw training data is deleted after processing. Only the compiled routing artifact is retained.

Deletion

Delete a Response Memory at any time. All associated data and artifacts are permanently removed within 24 hours.

FAQ

How much data do I need to train a Response Memory?

We recommend at least 500 records for meaningful clustering, but you'll see better results with 2,000+ records. The system works with as few as 100 records but routing accuracy improves with more data.

Can I update a Response Memory with new data?

Yes! You can retrain a Response Memory at any time by uploading new data. Each training run counts against your quota. We recommend retraining monthly or when your traffic patterns significantly change.

What happens if I don't have quality scores?

The system can still cluster your data and use latency/cost as optimization targets. However, including quality scores (even rough estimates) significantly improves routing accuracy.

Can I use a Response Memory with multiple routers?

Yes. A single Response Memory can be assigned to multiple routers. This is useful for sharing the same domain knowledge across different environments (dev, staging, prod).

Ready to get started?

Create your first Response Memory and see how personalized routing can improve your application's performance.