2024Full Stack & AI EngineerProduction6 months

Deepen

A second brain that doesn't just store what you know — it surfaces what you've forgotten you understood.

01 — Origin

The question that started everything

Why do we forget the most important connections between our own ideas?

I'd been journaling, bookmarking, note-taking for years. Thousands of entries across Notion, Obsidian, scattered markdown files. But when I needed to recall why I'd changed my mind about something six months ago — the exact thread of reasoning — it was gone. Buried under layers of newer, louder thoughts.

The tools were great at storing. Terrible at remembering.

I wanted something different: a system that doesn't wait for you to search. One that watches what you're thinking about right now and quietly says, "you wrote something related to this 4 months ago."

Obsession

Context is not just data — it's the timing of retrieval.

02 — Constraints

The walls I had to design around

LatencyRAG pipelines over local vector stores were too slow for real-time thought. If the system takes 3 seconds to surface a connection, the thought is already gone.
PrivacyPeople's inner thoughts are the most private data that exists. Cloud-only wasn't an option. Local-first was a requirement, not a feature.
Context windowsLLM context windows were the bottleneck. You can't feed 10,000 notes into a prompt. You need surgical retrieval.
Cold startA second brain with zero memories is useless. The system needed to feel valuable from day one, not after months of journaling.
03 — Decisions

The bets I placed

Embeddings at the edge. I sacrificed perfect semantic accuracy for speed. Moved the vector embedding pipeline to run locally using lightweight models, with optional cloud sync for heavier processing. The tradeoff: slightly less nuanced matches, but retrieval in under 200ms.

Chunking strategy. Early versions used fixed-size text chunks for the vector store. They were too small — the LLM hallucinated context that didn't exist because it was working with sentence fragments instead of complete thoughts. I switched to semantic chunking: splitting on paragraph boundaries and topic shifts instead of character counts.

chunking.ts
typescript
// Semantic chunking for better retrieval context
async function chunkDocument(text: string) {
const paragraphs = text.split("\n\n");

return paragraphs.map(p => ({
  content: p,
  embedding: await generateEmbedding(p),
  metadata: {
    timestamp: Date.now(),
    source: "journal"
  }
}));
}

Progressive disclosure. Instead of dumping all related notes at once, the system reveals connections gradually. First, a subtle indicator that related notes exist. Then, on hover, a preview. Then, the full context. This respects the user's flow instead of interrupting it.

Key Decision

Sacrificed perfect accuracy for speed. In a thought tool, a fast approximate answer beats a slow perfect one every time.

04 — The Failure

What broke (and taught me the most)

The first version hallucinated.

Not in the dramatic, obvious way. In the subtle, dangerous way. The system would surface a "related note" and present it with such confidence that users assumed the connection was real — even when the chunks were too fragmented to carry the original meaning.

One user told me: "It reminded me of something I never actually wrote."

That sentence haunted me. A second brain that creates false memories is worse than no second brain at all.

The fix wasn't technical. It was philosophical. I added confidence indicators — the system now shows how it found the connection (which words matched, what the similarity score was). Transparency over magic.

05 — Architecture

How it actually works

The system has three layers:

Capture layer. Markdown-native editor with real-time save. Every keystroke updates a local SQLite database. No cloud dependency for basic writing.

Intelligence layer. On save, text is chunked semantically and embedded using a local lightweight model. Embeddings are stored in a vector index (HNSW). A background worker continuously re-indexes as the knowledge base grows.

Surfacing layer. While writing, the system queries the vector index against the current paragraph. Matches above a threshold trigger subtle UI indicators. The user can expand them or ignore them. No pop-ups, no interruptions.

Deepen Interface
Context-aware writing interface with subtle suggestions
StorageLocal SQLite + vector index (HNSW)
EmbeddingsLightweight local model with optional cloud fallback
Retrieval< 200ms for 10k+ notes
FrontendReact + TypeScript + Framer Motion
06 — Learnings

What I'd tell myself before starting

Speed is a featureIn a thought tool, latency kills flow. 200ms feels instant. 2 seconds feels broken. The difference between those two numbers determined the entire architecture.
Hallucination is a trust problemWhen an AI tool confidently shows wrong information, users don't blame the tool — they blame themselves for misremembering. That's a dangerous failure mode.
Transparency beats magicShowing users WHY the system found a connection (similarity scores, matching phrases) built more trust than hiding the mechanics behind a polished UI.
Progressive disclosure respects flowDon't dump information. Whisper that it exists, then let the user choose to look. A second brain should support thinking, not replace it.
07 — Future

Where this is going

The current version works for individual use. But the interesting question is: what happens when second brains talk to each other?

Imagine a team where everyone has their own knowledge graph, and the system can find connections across people's thinking — without exposing private notes. Federated knowledge retrieval.

That's the next bet.

  • Multi-user knowledge graphs with privacy-preserving retrieval
  • CRDT-based sync for offline-first collaboration
  • Visual debugging tools for the embedding pipeline
  • Voice-to-thought capture for mobile
Nathanim
NathanimFull Stack & AI Engineer

A bored developer is a dangerous developer.