Skip to content

Configuration & deployment

runic.rag reads its entire runtime configuration from environment variables, validated into a single RagSettings object. This page is the reference: every RUNIC_RAG_* setting and its default, how to switch LLM/embedding providers and graph backends, how the schema is created in development versus production, and the python -m runic.rag CLI.

Settings load through pydantic-settings. The env prefix is RUNIC_RAG_ and matching is case-insensitive; a local .env at the repo root is read automatically. Two variables are deliberately unprefixedOPENAI_API_KEY and OLLAMA_BASE_URL — so they line up with the conventions the OpenAI and Ollama clients already expect. Call load_settings() to get a validated RagSettings (it runs dotenv.load_dotenv() first), or pass RagSettings(...) explicitly to override fields in code.

python
from runic.rag import RagSettings, load_settings

settings = load_settings()                 # reads .env + environment
settings = RagSettings(top_k=20)           # or override fields directly

Settings reference

Every field below maps to a RUNIC_RAG_<FIELD> variable (the two credential exceptions are noted). Defaults mirror .env.example exactly. Set 0 on the cost knobs to mean "unlimited".

LLM

VariableMeaningDefault
RUNIC_RAG_LLM_PROVIDERChat provider for extraction + answer synthesis (openai or ollama)openai
RUNIC_RAG_LLM_MODELChat model idgpt-5.4-nano

Embeddings

VariableMeaningDefault
RUNIC_RAG_EMBEDDING_PROVIDEREmbedding provider (openai or ollama)openai
RUNIC_RAG_EMBEDDING_MODELEmbedding model idtext-embedding-3-small
RUNIC_RAG_EMBEDDING_DIMEmbedding dimension — must match the model1536
RUNIC_RAG_EMBED_BATCH_SIZEMax texts per embedding request at ingest (<=0 → one request)128

Credentials

VariableMeaningDefault
OPENAI_API_KEYOpenAI key (unprefixed); required when a provider is openaiNone
RUNIC_RAG_OPENAI_BASE_URLOpenAI-compatible base URL (leave unset for api.openai.com)None
OLLAMA_BASE_URLOllama endpoint (unprefixed), e.g. http://localhost:11434/v1None

Backend

VariableMeaningDefault
RUNIC_RAG_BACKENDGraph backend: falkordb, neo4j, memgraph, arcadedb, or agefalkordb
RUNIC_RAG_FALKORDB_HOSTFalkorDB hostlocalhost
RUNIC_RAG_FALKORDB_PORTFalkorDB port6379
RUNIC_RAG_FALKORDB_GRAPHFalkorDB graph namerunic_rag
RUNIC_RAG_NEO4J_URINeo4j Bolt URI, e.g. bolt://localhost:7687None
RUNIC_RAG_NEO4J_USERNAMENeo4j usernameNone
RUNIC_RAG_NEO4J_PASSWORDNeo4j passwordNone
RUNIC_RAG_NEO4J_DATABASENeo4j databaseneo4j

INFO

memgraph, arcadedb, and age carry their own RUNIC_RAG_<BACKEND>_* connection variables (host/port/database/username/password, plus _GRAPH for AGE). See runic/rag/config.py for the full list; FalkorDB and Neo4j are the primary, natively-accelerated backends.

Chunking

VariableMeaningDefault
RUNIC_RAG_CHUNK_SIZETarget chunk size (characters)1200
RUNIC_RAG_CHUNK_OVERLAPOverlap between adjacent chunks200

Resolution

VariableMeaningDefault
RUNIC_RAG_RESOLVE_THRESHOLDCosine similarity above which two mentions auto-merge0.92
RUNIC_RAG_TIEBREAK_LOWLower bound of the ambiguous band0.82
RUNIC_RAG_TIEBREAK_HIGHUpper bound of the ambiguous band0.92
RUNIC_RAG_LLM_TIEBREAKUse an LLM to break ties inside the ambiguous bandfalse

Concurrency

VariableMeaningDefault
RUNIC_RAG_CONCURRENCYMax concurrent LLM/embedding requests during ingest8
RUNIC_RAG_REQUESTS_PER_MINUTEClient-side rate limit (0 = no limit)0

Cost / budget

VariableMeaningDefault
RUNIC_RAG_MAX_LLM_CALLSHard cap on LLM calls per run (0 = unlimited)0
RUNIC_RAG_MAX_TOKENSHard cap on tokens per run (0 = unlimited)0
RUNIC_RAG_GLEANING_PASSESExtra extraction passes per chunk to raise recall0
RUNIC_RAG_CACHE_DIROn-disk cache for LLM + embedding results.cache/runic-rag

Retrieval

VariableMeaningDefault
RUNIC_RAG_MAX_HOPSGraph-expansion depth for the local walk2
RUNIC_RAG_TOP_KRetrieval breadth (candidates fetched per source)10

The budget caps span the whole run — ingestion and the query path (extraction, high-level keyword extraction, synthesis) draw from the same max_llm_calls / max_tokens budget. Exceeding it raises BudgetExceededError.

TIP

Set RUNIC_RAG_CACHE_DIR to make re-ingestion cheap: re-running an unchanged document reuses cached LLM and embedding results instead of paying for them again.


Switching providers

By default both the LLM and the embedder talk to OpenAI. To run fully offline against Ollama, switch both providers — embeddings default to OpenAI independently of the chat model, so flipping only LLM_PROVIDER still sends every embedding to OpenAI.

bash
# Pull the models once:
ollama pull qwen2.5 && ollama pull nomic-embed-text

# Then point both providers at Ollama (.env or shell):
RUNIC_RAG_LLM_PROVIDER=ollama
RUNIC_RAG_LLM_MODEL=qwen2.5
RUNIC_RAG_EMBEDDING_PROVIDER=ollama
RUNIC_RAG_EMBEDDING_MODEL=nomic-embed-text
RUNIC_RAG_EMBEDDING_DIM=768
OLLAMA_BASE_URL=http://localhost:11434/v1

WARNING

RUNIC_RAG_EMBEDDING_DIM must match the embedding model — 1536 for text-embedding-3-small, 768 for nomic-embed-text. A mismatch surfaces as a vector-dimension error at query time, and changing the dimension after data exists means re-creating the vector index and re-embedding. Pick the dimension before you bootstrap.


Switching backends

RUNIC_RAG_BACKEND selects the graph store. FalkorDB (the default) and Neo4j are the primary backends and use native vector + fulltext procedures; memgraph, arcadedb, and age are also valid and fall back to a portable brute-force vector/fulltext path that works everywhere. The driverless path builds the driver for you from settings.backend:

python
from runic.rag import GraphRAG, RagSettings

# FalkorDB on localhost — nothing else to wire.
rag = GraphRAG.with_defaults(settings=RagSettings())

# Neo4j — set the backend + connection in the environment or RagSettings.
rag = GraphRAG.with_defaults(
    settings=RagSettings(
        backend="neo4j",
        neo4j_uri="bolt://localhost:7687",
        neo4j_username="neo4j",
        neo4j_password="secret",
    )
)

When you need control over the connection — a custom pool, an existing handle, or driver kwargs the facade does not set for you — construct the driver yourself with create_driver from runic.ogm and pass it in:

python
from runic.ogm import create_driver
from runic.rag import GraphRAG, RagSettings

settings = RagSettings(falkordb_graph="my_app")
driver = create_driver(
    "falkordb",                 # or "neo4j", "memgraph", "arcadedb", "age"
    host=settings.falkordb_host,
    port=settings.falkordb_port,
    graph=settings.falkordb_graph,
)
rag = GraphRAG.with_defaults(driver, settings=settings)

See also

  • OGM Quickstartcreate_driver and the driver/session model in full.

Schema lifecycle

The knowledge graph needs entity types and indexes — most importantly the vector index, whose dimension must equal embedding_dim. How you create them depends on the environment.

Development

Call bootstrap_schema() once at startup. It is idempotent (safe on every run) and creates the vector index with the real embedding_dim:

python
from runic.rag import GraphRAG, RagSettings

rag = GraphRAG.with_defaults(settings=RagSettings())
rag.bootstrap_schema()   # entity types + indexes, idempotent

WARNING

bootstrap_schema() raises ValueError if embedding_dim <= 0 — a vector index cannot be created without a positive dimension. This is a precondition: set RUNIC_RAG_EMBEDDING_DIM to your model's real dimension before bootstrapping.

Production

For a tracked, replayable schema use runic.migrate: introspect the live graph into a baseline migration, then evolve it with hand-written revisions and gate CI on drift.

bash
runic baseline    # introspect the graph -> a root migration
runic check       # CI gate: fail if models and graph have drifted

There is one gotcha to fix in the generated baseline. SchemaManager.sync_schema (and the introspection behind runic baseline) cannot know the embedding dimension, so it records vector indexes with a placeholder dimension of 0. bootstrap_schema() sidesteps this by always using the real embedding_dim; in a hand-written migration you must edit the generated root revision to set the vector dimension from 0 to the real value (e.g. 1536) before applying it. See schema for how index hints map to DDL across backends.

See also

  • Migration — versioned, replayable schema changes for production.
  • Schema managementIndexManager / SchemaManager, the vector-dimension placeholder, and per-backend DDL.
  • Migration Quickstartrunic baseline, revisions, and the runic check CI gate end to end.

CLI

python -m runic.rag is a thin wrapper over the facade verbs. It reads settings from the environment (including a local .env), wires the default adapter stack via GraphRAG.with_defaults(settings=load_settings()), and forwards to bootstrap_schema(), ingest_document() / ingest_text(), and query().

bash
python -m runic.rag bootstrap
# Schema bootstrapped.

python -m runic.rag ingest docs/whitepaper.pdf
# Ingested docs/whitepaper.pdf: 42 chunks, 88 entities, 61 relations, 130 mentions.

python -m runic.rag query "Who founded the company and when?" --mode hybrid
# <synthesized answer text>
#
# Citations:
# - [chunk-3] docs/whitepaper.pdf

ingest auto-detects .txt, .md, and .pdf and routes through the document adapters. --mode is one of auto (default), local, or hybrid — any other value raises ValueError.

WARNING

ingest --source S bypasses PDF/MD parsing and loads the file as plain text under the source tag S. Format auto-detection only happens on the plain ingest <path> (no --source) path, which calls ingest_document(). Pass a PDF without --source to get it parsed.


A .env example

Copy .env.example to .env and fill in your secrets. A minimal OpenAI + FalkorDB setup, with the Ollama local-mode block commented out:

bash
# ── LLM + embeddings (OpenAI defaults) ──────────────────────────────
RUNIC_RAG_LLM_PROVIDER=openai
RUNIC_RAG_LLM_MODEL=gpt-5.4-nano
RUNIC_RAG_EMBEDDING_PROVIDER=openai
RUNIC_RAG_EMBEDDING_MODEL=text-embedding-3-small
RUNIC_RAG_EMBEDDING_DIM=1536

# ── Credentials (unprefixed; required when provider=openai) ──────────
OPENAI_API_KEY=sk-replace-me

# ── Graph backend (FalkorDB default) ────────────────────────────────
RUNIC_RAG_BACKEND=falkordb
RUNIC_RAG_FALKORDB_HOST=localhost
RUNIC_RAG_FALKORDB_PORT=6379
RUNIC_RAG_FALKORDB_GRAPH=runic_rag

# ── Cost control (0 = unlimited) ────────────────────────────────────
RUNIC_RAG_MAX_LLM_CALLS=0
RUNIC_RAG_CACHE_DIR=.cache/runic-rag

# ── Local mode: switch BOTH providers to Ollama (uncomment) ──────────
# RUNIC_RAG_LLM_PROVIDER=ollama
# RUNIC_RAG_LLM_MODEL=qwen2.5
# RUNIC_RAG_EMBEDDING_PROVIDER=ollama
# RUNIC_RAG_EMBEDDING_MODEL=nomic-embed-text
# RUNIC_RAG_EMBEDDING_DIM=768
# OLLAMA_BASE_URL=http://localhost:11434/v1

Next steps

See also

  • Quickstart — the smallest end-to-end ingest-and-ask loop.
  • Retrieval & answers — the local / hybrid / auto modes and how answers are grounded.
  • Evaluating quality — measure faithfulness and relevancy as you tune these knobs.
  • API ReferenceRagSettings, GraphRAG, and the value objects in full.
  • Migration — productionize the schema with versioned migrations.

runic - Graph schema migrations and OGM for Cypher-based graph databases. · Impressum