Ollama (Local)

Use Sythoria with models running locally through Ollama. No API key, no cloud dependency, no data leaving your machine — just fast, private AI on your own hardware.

Prerequisites

Ollama installed on your machine
At least one model pulled locally

Setup

Install Ollama from ollama.com (or via Homebrew: brew install ollama)
Pull a model:
```
ollama pull llama3.1
```
Start the Ollama server:
```
ollama serve
```
In Sythoria, open Settings from the sidebar
Select Ollama (Local) from the Provider Preset dropdown — the API base URL and default model fill in automatically
Leave the API Key field empty (Ollama doesn't require one)
The connection indicator should turn green once Ollama is detected

Ollama must be running while you chat. If you close the Ollama server, Sythoria will show a red connection indicator.

Default configuration

Setting	Value
API Base	`http://localhost:11434/v1/chat/completions`
Default Model	`llama3.1`
API Format	OpenAI-compatible
Streaming	Supported
API Key	None required

Popular models

Model	Size	Pull command	Best for
Llama 3.1	8B / 70B	`ollama pull llama3.1`	General-purpose, strong all-rounder
Mistral	7B	`ollama pull mistral`	Fast, efficient reasoning
Code Llama	7B / 13B / 34B	`ollama pull codellama`	Code generation and analysis
Phi-3	3.8B	`ollama pull phi3`	Lightweight, runs on minimal hardware
Gemma 2	2B / 9B / 27B	`ollama pull gemma2`	Google's open model, good for conversation
Qwen 2.5	7B / 14B / 32B / 72B	`ollama pull qwen2.5`	Strong multilingual and coding performance
DeepSeek R1	8B / 14B / 32B / 70B	`ollama pull deepseek-r1`	Reasoning and math

Browse all available models at ollama.com/library.

Model auto-detection

Sythoria queries Ollama's local API (/api/tags) to discover available models. Any model you've pulled appears in the dropdown automatically — no manual configuration needed.

Hardware recommendations

Local model performance depends entirely on your hardware. As a rough guide:

Model size	Minimum RAM	Recommended RAM	GPU
3B–7B	8 GB	16 GB	Optional (CPU works)
13B–14B	16 GB	32 GB	Recommended
32B–34B	32 GB	64 GB	Strongly recommended
70B+	64 GB	128 GB	Required for acceptable speed

Tip: Use quantized models (Q4_K_M) for better performance on consumer hardware. Ollama downloads quantized versions by default.

Running Ollama on a different port

If you've configured Ollama to listen on a non-default port, update the API Base URL in Sythoria's settings:

http://localhost:<your-port>/v1/chat/completions

Troubleshooting

Error	Cause	Fix
Connection refused	Ollama isn't running	Run `ollama serve` in a terminal, or start the Ollama app
Model not found	Model hasn't been pulled yet	Run `ollama pull <model-name>` to download it
Slow responses	Hardware too limited for the model size	Try a smaller model or a quantized variant
Out of memory	Model exceeds available RAM/VRAM	Switch to a smaller model (e.g., 7B instead of 70B)
Connection indicator red	Can't reach `localhost:11434`	Ensure Ollama is running and no firewall is blocking port 11434