Response Memories
Train your router on your own evaluation logs and data. Response Memories create personalized routing models that understand your specific use cases and optimize for your unique traffic patterns.
What is a Response Memory?
A Response Memory is a trained routing model specific to your application. When you upload your historical request/response logs, ModelPilot analyzes patterns in your data to learn:
How It Works
Upload Your Data
Export your request logs as CSV or JSON. Each record should include the prompt, the model used, and ideally a quality score or outcome indicator.
// Example training data format
{
"prompt": "Write a Python function to parse JSON",
"model_used": "openai:gpt-4o",
"quality_score": 0.95,
"latency_ms": 1200,
"category": "coding" // optional
}Automatic Clustering
ModelPilot embeds your prompts and clusters them into semantic groups. This identifies distinct "task types" in your traffic—even ones you didn't explicitly label.
Example clusters discovered: SQL generation, code review, customer support, documentation writing, data analysis, creative brainstorming
Model Performance Analysis
For each cluster, we analyze which models performed best based on your quality scores, latency requirements, and cost constraints.
Runtime Artifact Generation
We compile a lightweight classifier that can route new requests in under 20ms. This artifact is deployed to edge nodes for minimal latency.
Data Format
promptThe input text sent to the model
model_usedThe model ID that handled this request (e.g., "openai:gpt-4o")
quality_scoreA 0-1 score indicating response quality (from human eval, automated scoring, or user feedback)
latency_msResponse time in milliseconds
categoryYour own task category label (helps validate clustering)
completionThe model's response (used for deeper analysis)
Limits & Quotas
Privacy & Security
Data Isolation
Each Response Memory is completely isolated. Your data is never mixed with other customers' data.
Encryption
All training data is encrypted at rest (AES-256) and in transit (TLS 1.3).
Data Retention
Raw training data is deleted after processing. Only the compiled routing artifact is retained.
Deletion
Delete a Response Memory at any time. All associated data and artifacts are permanently removed within 24 hours.
FAQ
We recommend at least 500 records for meaningful clustering, but you'll see better results with 2,000+ records. The system works with as few as 100 records but routing accuracy improves with more data.
Yes! You can retrain a Response Memory at any time by uploading new data. Each training run counts against your quota. We recommend retraining monthly or when your traffic patterns significantly change.
The system can still cluster your data and use latency/cost as optimization targets. However, including quality scores (even rough estimates) significantly improves routing accuracy.
Yes. A single Response Memory can be assigned to multiple routers. This is useful for sharing the same domain knowledge across different environments (dev, staging, prod).
Ready to get started?
Create your first Response Memory and see how personalized routing can improve your application's performance.