Every AI coding session starts as a blank slate. Decisions don’t carry over. Debugging context evaporates. Conventions get re-explained.

I was losing 10-15 minutes per session just rebuilding context.

So I studied 11 memory systems. Impressive engineering - but they share the same flaw:

Good advice and bad advice sit side-by-side forever.

They store memory. They don’t learn.


The Insight

Memory systems don’t improve because they don’t know what worked.

I built Memory Layer around one bet: retrieval quality comes from feedback, not from storing more text or building fancier graphs.

The obvious follow-up: who provides feedback?

Manual curation doesn’t scale.

Tasks do.

When work ships, the memories used during that task get promoted. When work stalls, the memories that led there stop surfacing. No thumbs-up buttons. No “rate this response.” Just outcomes.

Good memory rises, bad memory sinks


Results

After 6 weeks of real use:

  • Retrieval precision: 70% to 90%
  • Debugging familiar issues: 15 min to 3 min
  • Context re-explanation: ~50% reduction

The Series

  1. Why most AI memory systems don’t learn The landscape, the gap and why tasks turned out to be the missing signal.

  2. Building memory that learns 5-signal retrieval, asymmetric scoring, SQLite tradeoffs, edge cases.

  3. One memory store, every AI agent MCP, REST, SDK, Claude Code plugin, Beads and Claude Tasks integration.


Memory Layer is open source: github.com/runtimenoteslabs/memory-layer

pip install git+https://github.com/runtimenoteslabs/memory-layer.git
mem add "Always validate JWT server-side" -c convention
mem search "authentication"

One store. Every agent. Learns from what ships.