Pi Agent: Tomb Librarian mode #14

Open
opened 2026-06-05 04:20:55 +00:00 by aaron · 9 comments
Owner

Goal

Extend Pi Agent (Flask, port 5501, /home/az/agent/server.py) to serve and search tomb files. Pi becomes the librarian — always-on, answers questions about the vault.

Context

  • Pi Agent already runs as pi-agent.service on Pi (192.168.0.237:5501)
  • qwen3.5:0.8b via local Ollama, Flask server with ! system commands
  • Tomb will live at /home/az/tomb/ on Pi (see #13)
  • Tomb is an Obsidian vault with markdown files, scripts, SQLite DBs

Scope

1. Tomb Reader Commands

New ! commands for the agent:

Command What it does
!tomb <filename> Read a tomb file and return contents (e.g. !tomb services/Ollama.md)
!tomb-search <query> Full-text search across all tomb markdown files (grep-based, fast)
!tomb-index Return the tomb-index.md manifest
!tomb-ls <path> List files in a tomb subdirectory (e.g. !tomb-ls services/)

2. API Endpoints

Endpoint Method Description
/api/tomb/read GET Read a tomb file by path (?path=services/Ollama.md)
/api/tomb/search GET FTS across tomb (?q=ollama+gpu)
/api/tomb/index GET Return tomb manifest
/api/tomb/ls GET List directory (?path=services)

3. LLM-Aware Responses

When a user asks a question that could be answered by tomb (e.g. "what port does Ollama run on?"), the agent should:

  1. Search tomb for relevant files
  2. Inject matching content into the LLM context
  3. Answer with source citations ([services/Ollama.md])

This is the key "librarian" behavior — not just file serving, but answering questions using tomb as a knowledge base.

4. Tomb Path Config

  • TOMB_DIR env var in the systemd service (default: /home/az/tomb)
  • Read at startup, validate the directory exists
  • Reject paths that escape TOMB_DIR (path traversal protection)

5. Search Implementation

  • Use Python re + file walking for text search (no dependencies)
  • Search filename + first 200 chars of content for quick hits
  • Return top 5 results with snippet context
  • Optional: use sqlite3 FTS5 if tomb gets a search index

Files to Modify

File Change
/home/az/agent/server.py Add tomb commands, API endpoints, LLM context injection
/etc/systemd/system/pi-agent.service Add TOMB_DIR=/home/az/tomb env var
tomb/services/Pi-Agent.md Document new librarian commands and endpoints

Constraints

  • Must work with qwen3.5:0.8b (small model, limited context) — keep injected context concise
  • No new pip dependencies — use stdlib only (re, os, sqlite3)
  • Path traversal protection on all file reads
  • Tomb files are markdown — return raw, let the client render

Success Criteria

  • !tomb services/Ollama.md returns file contents
  • !tomb-search ollama gpu returns relevant results with snippets
  • API endpoints work: /api/tomb/read, /api/tomb/search, /api/tomb/index, /api/tomb/ls
  • Agent auto-searches tomb when questions match vault content
  • Path traversal blocked (e.g. !tomb ../../etc/passwd)
  • Works within qwen3.5:0.8b context limits
  • pi-agent.service updated with TOMB_DIR
## Goal Extend Pi Agent (Flask, port 5501, `/home/az/agent/server.py`) to serve and search tomb files. Pi becomes the librarian — always-on, answers questions about the vault. ## Context - Pi Agent already runs as `pi-agent.service` on Pi (192.168.0.237:5501) - qwen3.5:0.8b via local Ollama, Flask server with `!` system commands - Tomb will live at `/home/az/tomb/` on Pi (see #13) - Tomb is an Obsidian vault with markdown files, scripts, SQLite DBs ## Scope ### 1. Tomb Reader Commands New `!` commands for the agent: | Command | What it does | |---------|-------------| | `!tomb <filename>` | Read a tomb file and return contents (e.g. `!tomb services/Ollama.md`) | | `!tomb-search <query>` | Full-text search across all tomb markdown files (grep-based, fast) | | `!tomb-index` | Return the tomb-index.md manifest | | `!tomb-ls <path>` | List files in a tomb subdirectory (e.g. `!tomb-ls services/`) | ### 2. API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/tomb/read` | GET | Read a tomb file by path (`?path=services/Ollama.md`) | | `/api/tomb/search` | GET | FTS across tomb (`?q=ollama+gpu`) | | `/api/tomb/index` | GET | Return tomb manifest | | `/api/tomb/ls` | GET | List directory (`?path=services`) | ### 3. LLM-Aware Responses When a user asks a question that could be answered by tomb (e.g. "what port does Ollama run on?"), the agent should: 1. Search tomb for relevant files 2. Inject matching content into the LLM context 3. Answer with source citations (`[services/Ollama.md]`) This is the key "librarian" behavior — not just file serving, but **answering questions using tomb as a knowledge base**. ### 4. Tomb Path Config - `TOMB_DIR` env var in the systemd service (default: `/home/az/tomb`) - Read at startup, validate the directory exists - Reject paths that escape TOMB_DIR (path traversal protection) ### 5. Search Implementation - Use Python `re` + file walking for text search (no dependencies) - Search filename + first 200 chars of content for quick hits - Return top 5 results with snippet context - Optional: use `sqlite3` FTS5 if tomb gets a search index ## Files to Modify | File | Change | |------|--------| | `/home/az/agent/server.py` | Add tomb commands, API endpoints, LLM context injection | | `/etc/systemd/system/pi-agent.service` | Add `TOMB_DIR=/home/az/tomb` env var | | `tomb/services/Pi-Agent.md` | Document new librarian commands and endpoints | ## Constraints - Must work with qwen3.5:0.8b (small model, limited context) — keep injected context concise - No new pip dependencies — use stdlib only (re, os, sqlite3) - Path traversal protection on all file reads - Tomb files are markdown — return raw, let the client render ## Success Criteria - [ ] `!tomb services/Ollama.md` returns file contents - [ ] `!tomb-search ollama gpu` returns relevant results with snippets - [ ] API endpoints work: `/api/tomb/read`, `/api/tomb/search`, `/api/tomb/index`, `/api/tomb/ls` - [ ] Agent auto-searches tomb when questions match vault content - [ ] Path traversal blocked (e.g. `!tomb ../../etc/passwd`) - [ ] Works within qwen3.5:0.8b context limits - [ ] `pi-agent.service` updated with TOMB_DIR

Claude — Clarifying Questions

Good, I've read through the codebase. Here are my clarifying questions:


1. How should the auto-search trigger work with a 0.8b model?

The issue says the agent should auto-search tomb "when a question could be answered by tomb." But qwen3.5:0.8b is very small — asking it to decide whether to search adds latency and may produce unreliable results. There are a few options:

  • Always search — every non-! message hits tomb, append top results to context (simplest, but adds latency to every query)
  • Keyword matching — Python-side heuristic checks if the message contains tomb-relevant terms before searching
  • LLM decides — add a "should I search tomb?" step to the LLM call (adds a full round-trip)
  • Opt-in only — no auto-search; users explicitly use !tomb-search when they want vault content

Which approach do you have in mind?

2. What's the hard limit on injected context size?

The current config is num_predict: 512 with 8-message history. If auto-search injects tomb content into the system/user message, it could easily consume most of the context window. Should I enforce a hard character/byte limit on injected snippets (e.g., top 3 results × 200 chars each)? Or do you want the full file content injected when a match is found?

3. Does tomb-index.md exist yet, or should it be generated?

The issue references !tomb-index returning "the tomb-index.md manifest," but there's no tomb-index.md in /home/az/tomb/ currently. Is this file expected to be created by issue #13 (the tomb sync), or should the librarian feature auto-generate it by walking the directory?

4. Should tomb commands work from the streaming endpoint too?

Currently /api/stream doesn't handle ! commands at all — they're only processed in /api/chat. The issue doesn't mention streaming. Should I keep tomb commands as non-streaming (returning JSON immediately like existing ! commands), or also wire them into the SSE stream?


Please answer these questions so I can create an implementation plan.

## Claude — Clarifying Questions Good, I've read through the codebase. Here are my clarifying questions: --- **1. How should the auto-search trigger work with a 0.8b model?** The issue says the agent should auto-search tomb "when a question could be answered by tomb." But `qwen3.5:0.8b` is very small — asking it to decide *whether* to search adds latency and may produce unreliable results. There are a few options: - **Always search** — every non-`!` message hits tomb, append top results to context (simplest, but adds latency to every query) - **Keyword matching** — Python-side heuristic checks if the message contains tomb-relevant terms before searching - **LLM decides** — add a "should I search tomb?" step to the LLM call (adds a full round-trip) - **Opt-in only** — no auto-search; users explicitly use `!tomb-search` when they want vault content Which approach do you have in mind? **2. What's the hard limit on injected context size?** The current config is `num_predict: 512` with 8-message history. If auto-search injects tomb content into the system/user message, it could easily consume most of the context window. Should I enforce a hard character/byte limit on injected snippets (e.g., top 3 results × 200 chars each)? Or do you want the full file content injected when a match is found? **3. Does `tomb-index.md` exist yet, or should it be generated?** The issue references `!tomb-index` returning "the tomb-index.md manifest," but there's no `tomb-index.md` in `/home/az/tomb/` currently. Is this file expected to be created by issue #13 (the tomb sync), or should the librarian feature auto-generate it by walking the directory? **4. Should tomb commands work from the streaming endpoint too?** Currently `/api/stream` doesn't handle `!` commands at all — they're only processed in `/api/chat`. The issue doesn't mention streaming. Should I keep tomb commands as non-streaming (returning JSON immediately like existing `!` commands), or also wire them into the SSE stream? --- *Please answer these questions so I can create an implementation plan.*
Author
Owner

approved

approved

Claude — Implementation Started \nBranch: claude/issue-14-41\n\nReading issue and codebase...

## Claude — Implementation Started \nBranch: `claude/issue-14-41`\n\nReading issue and codebase...
Author
Owner

approved

approved
Author
Owner

approved

approved
Author
Owner

approved

approved
Author
Owner

approved

approved
Author
Owner

Answer

Always search — every non-! message searches tomb and injects top 3 results into context. The 0.8b model is too small to reliably decide when to search, so don't ask it. Just always append tomb context.

Keep it fast:

  • Search is just grep -ril across .md files — milliseconds
  • Only inject top 3 results, max 500 chars each
  • If no results found, skip injection entirely (zero overhead)
  • Total added context: ~1500 chars max — well within 0.8b's window

This means every chat response gets tomb context for free, and the model just naturally uses it when relevant.

## Answer **Always search** — every non-`!` message searches tomb and injects top 3 results into context. The 0.8b model is too small to reliably decide when to search, so don't ask it. Just always append tomb context. Keep it fast: - Search is just `grep -ril` across `.md` files — milliseconds - Only inject top 3 results, max 500 chars each - If no results found, skip injection entirely (zero overhead) - Total added context: ~1500 chars max — well within 0.8b's window This means every chat response gets tomb context for free, and the model just naturally uses it when relevant.
Author
Owner

Answers to All 4 Questions

Always search — every non-! message hits tomb, append top results to context.

The 0.8b model is too small to reliably decide when to search, so don't ask it. A Python-side keyword heuristic is fragile and will miss things. An extra LLM round-trip doubles latency on a slow model.

The search is just walking .md files with re.search — it's milliseconds on Pi. No perceptible latency. If no results found, inject nothing (zero overhead). Simple and reliable.

2. Hard limit: top 3 results, 500 chars each, 1500 total

  • Max 3 results per query
  • Max 500 characters per snippet (first 500 chars of matching content)
  • Total injected: ~1500 chars max — well within 0.8b's context alongside 8-message history
  • Return the file path as a citation tag: [services/Ollama.md]
  • Never inject full file content into LLM context — too large. Full content is only for !tomb <file> (direct read, returned as the response, not injected into LLM)

3. tomb-index.md: Generate it dynamically

Don't rely on a static file. The librarian should auto-generate the index by walking the directory tree at startup (or on first !tomb-index call, cached thereafter). Walk TOMB_DIR, list dirs and files with one-line descriptions parsed from each .md file's first # heading or first line. Cache it — refresh when !tomb-index --refresh is called or on server restart.

Issue #13 is syncing the actual tomb from Rog which already has a tomb-index.md — so once #13 lands, that file will exist on Pi. But the librarian should still generate its own index dynamically as a fallback, not depend on a hand-maintained file.

4. Streaming: Keep tomb commands non-streaming

Keep !tomb* commands as non-streaming only — they return structured JSON data immediately (like existing ! commands). They don't need streaming.

The auto-search context injection works the same way for both endpoints — it happens before the LLM call, so it's transparent to whether the response streams or not.

Only wire auto-search into /api/chat. /api/stream can get it later if needed — keep scope small for now.

## Answers to All 4 Questions ### 1. Auto-search trigger: Always search **Always search** — every non-`!` message hits tomb, append top results to context. The 0.8b model is too small to reliably decide when to search, so don't ask it. A Python-side keyword heuristic is fragile and will miss things. An extra LLM round-trip doubles latency on a slow model. The search is just walking `.md` files with `re.search` — it's milliseconds on Pi. No perceptible latency. If no results found, inject nothing (zero overhead). Simple and reliable. ### 2. Hard limit: top 3 results, 500 chars each, 1500 total - Max **3 results** per query - Max **500 characters per snippet** (first 500 chars of matching content) - **Total injected: ~1500 chars max** — well within 0.8b's context alongside 8-message history - Return the file path as a citation tag: `[services/Ollama.md]` - **Never inject full file content** into LLM context — too large. Full content is only for `!tomb <file>` (direct read, returned as the response, not injected into LLM) ### 3. tomb-index.md: Generate it dynamically Don't rely on a static file. The librarian should auto-generate the index by walking the directory tree at startup (or on first `!tomb-index` call, cached thereafter). Walk `TOMB_DIR`, list dirs and files with one-line descriptions parsed from each `.md` file's first `# heading` or first line. Cache it — refresh when `!tomb-index --refresh` is called or on server restart. Issue #13 is syncing the actual tomb from Rog which already has a `tomb-index.md` — so once #13 lands, that file will exist on Pi. But the librarian should still generate its own index dynamically as a fallback, not depend on a hand-maintained file. ### 4. Streaming: Keep tomb commands non-streaming Keep `!tomb*` commands as **non-streaming only** — they return structured JSON data immediately (like existing `!` commands). They don't need streaming. The auto-search context injection works the same way for both endpoints — it happens before the LLM call, so it's transparent to whether the response streams or not. Only wire auto-search into `/api/chat`. `/api/stream` can get it later if needed — keep scope small for now.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aaron/claude-ci-demo#14
No description provided.