Providers

Ollama (Local)

Use Sythoria with models running locally through Ollama. No API key, no cloud dependency, no data leaving your machine — just fast, private AI on your own hardware.

Prerequisites

  • Ollama installed on your machine
  • At least one model pulled locally

Setup

  1. Install Ollama from ollama.com (or via Homebrew: brew install ollama)

  2. Pull a model:

    ollama pull llama3.1
    
  3. Start the Ollama server:

    ollama serve
    
  4. In Sythoria, open Settings from the sidebar

  5. Select Ollama (Local) from the Provider Preset dropdown — the API base URL and default model fill in automatically

  6. Leave the API Key field empty (Ollama doesn't require one)

  7. The connection indicator should turn green once Ollama is detected

Ollama must be running while you chat. If you close the Ollama server, Sythoria will show a red connection indicator.

Default configuration

SettingValue
API Basehttp://localhost:11434/v1/chat/completions
Default Modelllama3.1
API FormatOpenAI-compatible
StreamingSupported
API KeyNone required

Popular models

ModelSizePull commandBest for
Llama 3.18B / 70Bollama pull llama3.1General-purpose, strong all-rounder
Mistral7Bollama pull mistralFast, efficient reasoning
Code Llama7B / 13B / 34Bollama pull codellamaCode generation and analysis
Phi-33.8Bollama pull phi3Lightweight, runs on minimal hardware
Gemma 22B / 9B / 27Bollama pull gemma2Google's open model, good for conversation
Qwen 2.57B / 14B / 32B / 72Bollama pull qwen2.5Strong multilingual and coding performance
DeepSeek R18B / 14B / 32B / 70Bollama pull deepseek-r1Reasoning and math

Browse all available models at ollama.com/library.

Model auto-detection

Sythoria queries Ollama's local API (/api/tags) to discover available models. Any model you've pulled appears in the dropdown automatically — no manual configuration needed.

Hardware recommendations

Local model performance depends entirely on your hardware. As a rough guide:

Model sizeMinimum RAMRecommended RAMGPU
3B–7B8 GB16 GBOptional (CPU works)
13B–14B16 GB32 GBRecommended
32B–34B32 GB64 GBStrongly recommended
70B+64 GB128 GBRequired for acceptable speed

Tip: Use quantized models (Q4_K_M) for better performance on consumer hardware. Ollama downloads quantized versions by default.

Running Ollama on a different port

If you've configured Ollama to listen on a non-default port, update the API Base URL in Sythoria's settings:

http://localhost:<your-port>/v1/chat/completions

Troubleshooting

ErrorCauseFix
Connection refusedOllama isn't runningRun ollama serve in a terminal, or start the Ollama app
Model not foundModel hasn't been pulled yetRun ollama pull <model-name> to download it
Slow responsesHardware too limited for the model sizeTry a smaller model or a quantized variant
Out of memoryModel exceeds available RAM/VRAMSwitch to a smaller model (e.g., 7B instead of 70B)
Connection indicator redCan't reach localhost:11434Ensure Ollama is running and no firewall is blocking port 11434
SythoriaDocs Navigation