Streaming Responses
Stream responses in real-time for better user experience.
Streaming allows you to receive responses progressively as they are generated, rather than waiting for the complete response. This creates a more responsive user experience, especially for longer outputs.
- Faster perceived response time - Users see output immediately
- Better UX for long responses - Progress is visible
- Lower memory usage - Process chunks as they arrive
Basic Streaming
Enable Streaming
Set stream: true in your request
javascript
import ModelPilot from 'modelpilot';
const client = new ModelPilot({
apiKey: process.env.MODELPILOT_API_KEY,
routerId: process.env.MODELPILOT_ROUTER_ID,
});
async function streamResponse() {
const stream = await client.chat.completions.create({
messages: [
{ role: 'user', content: 'Write a short story about a robot' }
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
console.log('\n\nStream complete!');
}
streamResponse();Stream Chunk Format
Chunk Structure
Each chunk follows this format
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "gpt-5",
"choices": [
{
"index": 0,
"delta": {
"content": "Hello" // Incremental content
},
"finish_reason": null
}
]
}
// Last chunk includes finish_reason
{
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}React Integration
React Component
Display streaming responses in React
javascript
import ModelPilot from 'modelpilot';
const client = new ModelPilot({
apiKey: process.env.MODELPILOT_API_KEY,
routerId: process.env.MODELPILOT_ROUTER_ID,
});
export default function ChatComponent() {
async function handleSubmit(userMessage) {
setIsLoading(true);
setResponse('');
try {
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: userMessage }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
setResponse(prev => prev + content);
}
} catch (error) {
console.error('Streaming error:', error);
} finally {
setIsLoading(false);
}
}
return (
<div>
<div className="response">
{response}
{isLoading && <span className="cursor">|</span>}
</div>
</div>
);
}Server-Sent Events (SSE)
Direct API Usage
If not using the SDK, handle SSE manually
javascript
const response = await fetch('https://modelpilot.co/api/router/{routerId}/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${API_KEY}`,
},
body: JSON.stringify({
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content || '';
console.log(content);
}
}
}Error Handling
Handling Stream Errors
javascript
async function streamWithErrorHandling() {
try {
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
});
for await (const chunk of stream) {
try {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
} catch (chunkError) {
console.error('Error processing chunk:', chunkError);
// Continue processing other chunks
}
}
} catch (error) {
if (error.status === 429) {
console.error('Rate limit exceeded');
} else if (error.status === 503) {
console.error('Service temporarily unavailable');
} else {
console.error('Streaming error:', error.message);
}
}
}Best Practices
When to Use Streaming
- ✓ Long-form content generation
- ✓ Interactive chat applications
- ✓ Real-time code generation
- ✓ Stories, articles, or creative writing
When NOT to Use Streaming
- ✗ JSON/structured output parsing
- ✗ Batch processing
- ✗ Function calling responses
- ✗ Short responses (overhead not worth it)
Performance Tips
- • Buffer chunks for UI updates (e.g., every 50ms) to avoid excessive re-renders
- • Implement timeout handling for long-running streams
- • Handle connection drops gracefully with retry logic
- • Use abort controllers to cancel streams when needed
- • Monitor memory usage when accumulating large responses
Cancelling Streams
Using AbortController
javascript
const controller = new AbortController();
async function cancellableStream() {
try {
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Long response...' }],
stream: true,
}, {
signal: controller.signal,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
console.log(content);
}
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled by user');
}
}
}
// Cancel the stream
controller.abort();