Skip to content

Server prerequisites

Recommended sizing to run Myeline with an interactive experience (perceived RAG response under 5 s).

Key takeaway

A single RAG query saturates 2-4 cores for 0.5-2 s for embedding + 4-8 cores for 5-30 s for local synthesis. 8 cores is the practical floor; 4 cores cannot sustain interactive use.

TL;DR by profile

Profile vCPU RAM Disk GPU Notes
Demo / 1 user 8 16 GB 50 GB SSD Local CPU embedding
Sovereign ≤ 20 users 16 32 GB 200 GB NVMe Mistral-Nemo CPU = 15-40 s/query
Sovereign ≤ 200 users 24 64 GB 500 GB NVMe RTX 4090 24 GB or L40S GPU strongly recommended
Sovereign large / Llama 70B 32 128 GB 1 TB NVMe 2× L40S 48 GB Llama 3.1 70B Q4 or Mixtral 8×7B
Sovereign-hybrid ≤ 100 users 8 16 GB 100 GB SSD Synthesis offloaded to BYOK API, embedding stays local

Detailed consumption

CPU

Load Demand
Web + worker (idle) 1-2 cores
Embedding (bge-m3 CPU) 2-4 cores at 100 % for 0.5-2 s per query
ChromaDB HNSW search 1-2 cores for ~100 ms
External API synthesis (sovereign-hybrid) 0 (network-bound)
Ollama CPU LLM synthesis 4-8 cores at 100 % for 5-30 s
MariaDB 1 core (2+ at peak)
Cron (most jobs < 30 s) bursts only

Single-user reality: even an idle server saturates 4 cores during the request (embedding + HNSW + synthesis chained).

Multi-user reality: with 8 cores you handle 1-2 active concurrent requests comfortably; beyond that they queue. For 4+ active concurrent users plan 16+ cores.

Memory (sovereign — local LLM)

Add the chosen Ollama model:

Local model Quant. RAM resident Notes
mistral-nemo (12 B) Q4_K_M ~7 GB Default. Decent, CPU slow
mistral-nemo Q8 ~13 GB Better quality
mixtral-8x7b Q4 ~26 GB CPU ≥ 30 s/answer, GPU recommended
llama3.1:70b Q4 ~40 GB Top-tier local, GPU mandatory

With GPU the models live in VRAM, not system RAM; the figures above remain roughly valid for the global memory budget, but latency improves 5-20× depending on the card.

Disk

Usage Size Notes
OS + container images 5-10 GB Python slim image ~600 MB; Ollama models dominate
Ollama models bge-m3 600 MB · mistral-nemo Q4 7 GB Stored in data/ollama/
ChromaDB ~10 KB / indexed chunk 100k chunks ≈ 1 GB; grows linearly
MariaDB 100 MB → 5 GB Audit log + conversations dominate
Uploads unbounded Capped per plan (max_file_size)
Backups (30 d retention) 2-5× live data backup_databases cron
Logs ~10 MB / day Rotation via Podman / journald

NVMe vs SATA SSD: better p99 for ChromaDB (HNSW seeks) and MariaDB.

Network

  • Outbound (sovereign-hybrid only) — AI provider + Brevo + GHCR: ~10 Mbps sustained, more during image pulls.
  • Inbound: 100 Mbps comfortable for a few dozen concurrent users.
  • Latency to AI provider (Mistral Paris/Frankfurt, Anthropic / OpenAI / Gemini US): aim for < 100 ms.
  • In pure sovereign: no outbound traffic (air-gap).

OS and runtime

  • Linux: Rocky / AlmaLinux 9, Debian 12, Ubuntu 22.04 LTS+
  • Containers: Podman 4.6+ rootless recommended, Docker works too (docker-compose v2)
  • systemd: required for Podman pods unit-managed (sovereign)
  • Python 3.11+ if you run outside containers (official path: containers)
  • SELinux Enforcing: OK — the compose file labels volumes with :z
  • GPU (sovereign with GPU): NVIDIA drivers 535+, NVIDIA Container Toolkit installed; Ollama auto-detects