Streaming Responses
Sythoria uses Server-Sent Events (SSE) to stream responses token by token as they're generated. No polling, no batching, no waiting for the full response — you see text appear the moment the model produces it.
How it works
You send a message
↓
Sythoria opens an SSE connection to the provider
↓
Tokens arrive incrementally and render immediately
↓
Connection closes when the response is complete
- You type a message and press
Enter - Sythoria opens a streaming connection to your chosen provider's API
- Each token is rendered in the chat as it arrives — typically within milliseconds of generation
- The stream ends when the model signals completion (
data: [DONE])
Benefits
| Benefit | What it means for you |
|---|---|
| Lower perceived latency | First tokens appear in milliseconds, not after the entire response finishes |
| Native API compatibility | Sythoria passes through the streaming protocol as-is, so behavior matches the provider's SDK |
| Cancellation support | Stop generation mid-stream with the stop button or Escape |
| No added overhead | Sythoria's direct connection streams responses immediately — no buffering or middleman servers |
Supported providers
All providers support streaming:
| Provider | Streaming | Protocol |
|---|---|---|
| OpenAI | Supported | SSE |
| Anthropic | Supported | SSE |
| Google Gemini | Supported | SSE |
| Ollama | Supported | SSE (NDJSON) |
| OpenRouter | Supported | SSE |
| NVIDIA NIM | Supported | SSE |
Stopping a stream
Click the Stop button that appears during generation, or press Escape. The partial text remains in your conversation so you can pick up where you left off.
Technical details
The Sythoria desktop application establishes direct connections to the provider to stream responses directly. There is no middleman server, buffering, or transformation — it's a direct API connection:
- No added latency beyond the provider's first-token time
- Full compatibility with provider-specific streaming features (e.g., Anthropic's content blocks, OpenAI's function calling)
- Automatic reconnection is handled by the application's connection handler
For developers: the streaming uses the standard OpenAI streaming format (data: {...} lines terminated by data: [DONE]). Anthropic's native format is translated to this standard format by Sythoria's internal client layer.