Cost Optimization

Reduce AI API costs while maintaining quality.

Agentlify's intelligent routing can significantly reduce your AI costs by automatically selecting cost-effective models that meet your quality requirements.

Quick Wins

Use Smart Router

Smart Router automatically selects cost-effective models based on prompt complexity. Simple requests use cheaper models, complex ones use premium models only when needed.

Dashboard Configuration:

• Cost: 50% (Higher focus)
• Quality: 30%
• Speed: 10%
• Carbon: 10%

Optimize Prompts

Shorter, more focused prompts reduce token usage and costs.

Before (expensive)

Please analyze the following text and provide a comprehensive summary including all key points, main arguments, supporting evidence, and conclusions. Additionally, please evaluate the tone, writing style, and intended audience...

After (optimized)

Summarize the key points and conclusions from this text:

Set max_tokens

Limit output length to prevent unnecessary token usage.

javascript

const completion = await client.chat
  .completions.create({
    messages: [{
      role: 'user',
      content: 'Explain quantum computing'
    }],
    max_tokens: 150
  });

Advanced Strategies

Implement Caching

Cache identical or similar requests

javascript

const cache = new Map();

async function cachedCompletion(prompt) {
  if (cache.has(prompt)) {
    console.log('Cache hit - $0 cost');
    return cache.get(prompt);
  }
  
  const completion = await client.chat
    .completions.create({
      messages: [{
        role: 'user',
        content: prompt
      }]
    });
  
  cache.set(prompt, completion);
  return completion;
}

Batch Similar Requests

Process multiple items in one request

Inefficient

javascript

// 3 separate API calls
await analyzeText(text1);
await analyzeText(text2);
await analyzeText(text3);

Optimized

javascript

// 1 API call with batched input
const completion = await client.chat
  .completions.create({
    messages: [{
      role: 'user',
      content: `Analyze these:
        1. ${text1}
        2. ${text2}
        3. ${text3}`
    }]
  });

Use Structured Output

Reduce output tokens with JSON format

javascript

const completion = await client.chat
  .completions.create({
    messages: [{
      role: 'system',
      content: 'Return JSON: title, summary'
    }, {
      role: 'user',
      content: 'Analyze this article...'
    }],
    response_format: { type: 'json_object' }
  });

// JSON output saves tokens

Temperature Control

Lower temperature = more deterministic = can cache better

javascript

const completion = await client.chat
  .completions.create({
    messages: [{
      role: 'user',
      content: 'Classify sentiment: ...'
    }],
    temperature: 0 // Better caching
  });