small context windows + no persistent memory = hard to solve multi-step or long-horizon tasks. For those who have built serious local setups: How do you give your model persistent memory? Vector DBs? RAG? Fine-tuned adapters? Some kind of external state management loop? Or a custom “memory module” you wrote yourself? I’m looking for practical approaches that let a local model remember past steps, keep working on long tasks, and behave more like an agent with continuity.
https://github.com/robertolupi/augmented-awareness/blob/main...
I use it mostly non-interactively, to summarize my past diary entries and to create a Message Of The Day (MOTD) shown when I launch a terminal.
Most local LLM setups break down because people try to use the model as both the reasoning engine and the memory store. That doesn’t scale. What works in production is a layered approach: external long-term memory (vector DB + metadata), short-term working state, aggressive summarization, and strict retrieval and evaluation loops.
That’s what we built at https://www.ailog.fr . We provide a production-ready RAG stack with persistent memory, retrieval controls, grounding checks, and evaluation tooling so models can handle long-horizon, multi-step tasks without blowing up the context window. It works with local or hosted models and keeps memory editable, auditable, and observable over time.
You can still build this yourself with Ollama, Chroma/Qdrant, and a custom orchestrator, but if you want something already wired, tested, and scalable, that’s the niche we’re filling.
Happy to answer questions or share architecture details if useful.