AI Agent

We Open-Sourced Engram: A Brain-Inspired Context Database for AI Agents

Jia Chen

16 Mar 2026 • 10 min read

Today we're open-sourcing Engram, a context database for AI agents. It's available now on GitHub under the MIT license.

We built Engram because we kept running into the same problem across every AI agent project we worked on: context management is broken, and everyone is reinventing the same broken wheel. Every framework rolls its own summarization, its own prompt stuffing, its own way of forgetting things that matter. The result is agents that drift from their goals, lose critical details, and can't share what they've learned with anything else.

Engram is our answer: a standalone, open-source context database that any AI agent can connect to, regardless of which LLM it uses or which framework it runs on. Think of it as Postgres for agent memory. Applications don't build their own storage engines — they connect to a database. We think agents should do the same with context.

The problem we kept hitting

Although we've been building AI engineering products since 2019, with AI Agents, the paradigms we are used to have shifted drastically. Over the past two years, as we shipped more agent-based systems, we identified three failure modes that kept showing up no matter what framework or model we used.

The first is context decay. Long-running agents gradually drift from their original intent as context gets summarized and compressed over multiple turns. Key details — the specific constraint a user mentioned in turn three, the failed approach that shouldn't be retried — get lost in the summarization. The agent keeps running, but it's slowly forgetting why it's running.

The second is context isolation. If you're using Claude for one task and GPT for another, the knowledge each accumulates is trapped. Claude's memory can't talk to GPT's memory. LangGraph's state can't flow into CrewAI's state. Your agents are smart individually but collectively amnesiac. Even within a single platform, different agent sessions can't easily share what they've learned.

The third is context-as-text. Current systems store context as raw text or text summaries. But useful knowledge isn't a wall of text — it's structured. It has relationships, salience, provenance. Knowing that "PaddleOCR is faster than Textract" is useful. Knowing that this fact was validated in three production runs, contradicts an earlier assumption, and relates to a pending architecture decision is far more useful. Text summaries throw away all of that structure.

There's also a fourth, more subtle problem: context doesn't learn. Your agent might retrieve the same unhelpful piece of context fifty times, and the system has no mechanism to notice that it's wasting tokens on something that never helps. There's no feedback loop between retrieval and outcomes.

What Engram actually is

Engram is a server that sits between your AI agents and a knowledge graph. Agents send raw text in and get structured, relevant context back out. Under the hood, it stores knowledge as a concept graph of atomic knowledge "bullets" — discrete, individually trackable insights rather than blocks of text — connected by typed, weighted relationships.

There are really only two core operations. You commit raw text and execution feedback, and the server extracts structured knowledge from it. You materialize context, and the server assembles the right knowledge for the right agent at the right time, rendered in the format that model prefers. Everything else — the extraction, the deduplication, the learning, the forgetting — happens inside the server.

Engram is model-agnostic and framework-agnostic by design. It works with Claude, GPT, Gemini, DeepSeek, or local models. It integrates with LangGraph, CrewAI, AG2, the OpenAI Agents SDK, or bare API calls. Your context belongs to you, not to your LLM provider. Start a project in Claude, continue in GPT, hand off to a local model — the context persists and transfers because it lives in Engram, not in any one platform's memory.

Inspired by how the brain actually works

The name "engram" comes from neuroscience — it's the term for a physical trace of memory in the brain. We chose it because Engram's architecture is directly inspired by how human memory actually works, not by how filing cabinets work.

Your brain doesn't store memories as verbatim transcripts. It stores atomic associations, connects them through weighted relationships, strengthens pathways that prove useful, lets unused memories fade, and builds abstract schemas from repeated experience. When you recall something, the act of remembering actually changes the memory — a process called reconsolidation. And during sleep, your brain reorganizes, deduplicates, and abstracts what you've learned.

Engram implements all of these mechanisms. The ingestion pipeline acts as the hippocampus, encoding new experiences into discrete memory traces. The concept graph is the neocortex — long-term storage with associative relationships. A salience scorer acts as the amygdala, tagging importance and surprise. The consolidation engine is analogous to sleep — a periodic background process that reorganizes, abstracts patterns into schemas, strengthens frequently co-recalled pathways, and prunes what's no longer relevant. And the reconsolidation loop means that every time context is recalled and used, the system learns from the outcome, boosting bullets that helped and decaying ones that didn't.

This isn't just a cute metaphor. These mechanisms solve real engineering problems. Active forgetting keeps context lean and retrieval quality high. Schema formation compresses recurring patterns so you get better coverage in fewer tokens. Reconsolidation creates a reinforcement learning loop where the concept graph genuinely gets smarter with use. Every other agent memory product we've seen is building a filing cabinet. We're trying to build something that learns.

Key design decisions

A few architectural choices are worth calling out because they represent strong opinions we formed through building real agent systems.

Bullets, not paragraphs. The fundamental unit of storage in Engram is a "bullet" — a single, atomic, actionable insight, typically one to two sentences. Each bullet is independently retrievable, updatable, and deletable. It has a type (fact, decision, strategy, warning, procedure, exception, principle), a salience score, confidence, and full usage statistics tracking how many times it's been recalled and whether it helped. This granularity is what makes everything else possible: precise retrieval, targeted updates, meaningful learning signals, and efficient token budget allocation.

Delta updates, never rewrites. Every mutation to the concept graph is expressed as an atomic delta operation within a delta batch. The full context is never regenerated wholesale. This is a hard constraint, and it's the single most important design decision in the system. When you regenerate context from scratch on every update, you get context collapse — subtle but devastating drift as the regeneration process drops, rewords, or deprioritizes information. Delta operations prevent this entirely, and as a bonus, they give you a full audit trail of every change that's ever been made to the graph.

A canonical Reflector model. When an agent commits raw text, Engram doesn't let the agent extract its own bullets. Instead, the server runs a single, canonical LLM (the "Reflector") that processes all raw input from all agents. This is configured at the server level, not per-agent. The reason: if each agent ran its own extraction, context built by Claude would have a different personality — different granularity, emphasis, and terminology — than context built by GPT. A canonical Reflector eliminates cross-model variance and guarantees consistent knowledge quality regardless of who wrote the data.

Raw input preservation. Every commit stores the original, unprocessed text in an activity ledger alongside the extracted bullets. Think of bullets as compiled output and raw inputs as source code. This means context can be "recompiled" from scratch when a better Reflector model becomes available. A context created today with Haiku can be re-extracted next year with a much better model, pulling out insights the original model missed. Your context gets smarter not just through use, but through model upgrades.

Intent anchors. Every context has an immutable intent anchor — the objective, success criteria, and constraints for the project. This anchor is always included in materialized context and cannot be edited after creation. It's the mechanism that prevents drift over long-running sessions. No matter how many agents contribute, no matter how much the concept graph evolves, the system always knows what it's trying to accomplish.

Bounded, not infinite. Engram doesn't let context grow without limits. Unbounded growth degrades retrieval quality — it's the same reason the brain actively forgets. Engram uses a three-tier data lifecycle (active, archived, purged) with capacity management that progressively consolidates, compresses, and archives as a context approaches its limits. Schema compression is the primary mechanism: fifteen individual bullets about a recurring pattern get compressed into one schema plus a few exceptions — an 80% reduction in bullet count while preserving the retrievable knowledge.

How the learning loop works

The part we're most excited about is the reconsolidation loop — Engram's mechanism for learning from outcomes.

When an agent calls materialize, it gets back a context payload and a materialization ID — essentially a receipt that records exactly which bullets were included. The agent does its work. Then, when the agent commits results, it passes back that materialization ID along with structured execution feedback: did the task succeed or fail? What tools were called? What metrics were observed?

The reconsolidation engine then connects the dots. If the task succeeded, the bullets that were recalled get a boost to their hit count and salience. If it failed, they get a miss. Over time, this creates a reinforcement signal: bullets are ranked not just by raw salience, but by "effective salience" — a score that incorporates their track record of actually being useful. Context that consistently proves helpful floats to the top. Context that gets retrieved but never contributes gradually fades.

This is a genuinely different approach from anything else in the agent's memory space. Most systems treat memory as static storage — write it, read it, maybe summarize it. Engram treats memory as a living system that improves through use.

Multi-agent by default

Multi-agent coordination isn't an edge case for us — it's a core use case. In our production systems, we regularly have an OCR agent, an architecture agent, and a QA agent all working on the same project context simultaneously. Engram is designed for this from the ground up.

The concurrency model follows a simple principle: parallelize computation, serialize application. The expensive work — running the Reflector, computing embeddings, doing deduplication — happens concurrently across agents. The cheap work — applying delta operations to the graph — is serialized per context via an advisory lock. This means throughput is limited only by computation, not by lock contention.

Conflicts are resolved by design. If two agents add contradictory information, both observations are kept with a CONTRADICTS edge between them — because both observations are valid data points. Agents can subscribe to context changes via server-sent events, so when one agent records an architecture decision that constrains another's work, the second agent gets notified and can re-materialize to pick up the new constraint.

Standing on the shoulders of ACE

We want to acknowledge a major intellectual debt. Engram's ingestion pipeline — the Reflector/Curator architecture, the bullet-point storage model, the delta operation mechanism, and the execution feedback integration — draws heavily from the Agentic Context Engineering (ACE) framework, a research paper out of Stanford and SambaNova Systems.

ACE demonstrated that separating "analyze what happened" (Reflector) from "decide what to store" (Curator) produces significantly better results than a combined step, that itemized bullet-point context dramatically outperforms free-form text summaries, and that execution feedback signals — structured success/failure data from tool calls and task outcomes — produce better reflections than parsing raw conversation text. These are foundational insights that we've incorporated directly into Engram's design.

Where Engram goes beyond ACE is in making this a database rather than a framework — model-agnostic, multi-agent, cross-platform — and in adding the brain-inspired mechanisms (reconsolidation, schema formation, active forgetting, consolidation) that make the stored context a living, learning system rather than a static playbook.

What you can do with it today

The v0.1 release includes the core system: the bullet data model with usage tracking and lifecycle management, the Reflector-to-Curator ingestion pipeline, materialization with effective salience ranking, the reconsolidation loop, basic consolidation (forgetting curve, semantic dedup, archival, purge), per-context advisory locking for multi-agent safety, intent anchors, delta operations with full audit history, and raw input preservation in the activity ledger.

On the integration side, we ship with a Python SDK, a REST API via FastAPI, an MCP server for Claude, and OpenAI function calling tool definitions. Storage backends include SQLite for local development (zero setup) and PostgreSQL with pgvector for production. Docker Compose gets you running in one command.

Getting started is three lines: pip install engram-contextdb, then engram to start the server. Full documentation, examples, and a LangGraph integration walkthrough are in the README.

Engram Cloud: use it right now

If you don't want to self-host, or if you just want to start using shared AI memory today, we also built engram.so — a managed cloud app on top of the open-source engine. It takes about two minutes to set up, and it's free to start.

The cloud version lets you connect Engram to the AI tools you already use, without running a server or writing any code. For Claude, you add Engram as a Cloud Connector — go to Feature Preview, enable Connectors, and add your Engram API key. For ChatGPT, you can install the Engram Custom GPT from the GPT Store, or set up a Custom GPT with Engram Actions. Once connected, your AI conversations get persistent, shared memory across both platforms.

This means you can tell Claude something in the morning and ask ChatGPT about it at night. You can use Claude for coding and ChatGPT for brainstorming, and both have access to the same project context. If you're on a team, your teammates' agents can read and write to the same shared memory — so the knowledge your team builds up with AI doesn't stay trapped in individual chat histories.

The cloud app also gives you a dashboard where you can see exactly what Engram remembers, edit or delete individual memories, and browse the concept graph. Under the hood, it's running the same open-source engine — the same Reflector, Curator, consolidation, and reconsolidation loop. The cloud just adds managed hosting, the connector integrations, and the UI.

Where we're going

The v0.1 you see today is the foundation. On the roadmap: PostgreSQL with Apache AGE for proper graph queries, full schema induction (LLM-powered pattern abstraction from repeated experiences), the re-extraction API for recompiling context with better models, event subscriptions for real-time multi-agent coordination, native LangGraph and CrewAI integrations, a TypeScript SDK, and eventually a Rust core engine for high-throughput production deployments.

We're also planning a managed cloud offering for teams that don't want to self-host, with tiered storage limits, SSO, data residency options, and compliance features. But the open-source version will always be fully functional with no artificial caps.

Try it

Engram is MIT-licensed and available now. The repo is at github.com/softmaxdata/engram. Clone it, pip install it, run the server, and connect your agents. We'd love feedback — especially from people building multi-agent systems, cross-platform workflows, or long-running agent processes where context decay is a real problem.

If you're building something interesting with it, we want to hear about it. And if you want to contribute, the architecture design document in the repo lays out the full vision — there's a lot of surface area for contributions, from storage backends to framework integrations to the consolidation engine.

We think agent memory is one of the most important unsolved problems in AI engineering. We built Engram because we needed it. We're open-sourcing it because we think everyone does.