Pi Agent: Tomb Librarian mode

aaron commented

2026-06-05 04:20:55 +00:00

Owner

Goal

Extend Pi Agent (Flask, port 5501, /home/az/agent/server.py) to serve and search tomb files. Pi becomes the librarian — always-on, answers questions about the vault.

Context

Pi Agent already runs as pi-agent.service on Pi (192.168.0.237:5501)
qwen3.5:0.8b via local Ollama, Flask server with ! system commands
Tomb will live at /home/az/tomb/ on Pi (see #13)
Tomb is an Obsidian vault with markdown files, scripts, SQLite DBs

Scope

1. Tomb Reader Commands

New ! commands for the agent:

Command	What it does
`!tomb <filename>`	Read a tomb file and return contents (e.g. `!tomb services/Ollama.md`)
`!tomb-search <query>`	Full-text search across all tomb markdown files (grep-based, fast)
`!tomb-index`	Return the tomb-index.md manifest
`!tomb-ls <path>`	List files in a tomb subdirectory (e.g. `!tomb-ls services/`)

2. API Endpoints

Endpoint	Method	Description
`/api/tomb/read`	GET	Read a tomb file by path (`?path=services/Ollama.md`)
`/api/tomb/search`	GET	FTS across tomb (`?q=ollama+gpu`)
`/api/tomb/index`	GET	Return tomb manifest
`/api/tomb/ls`	GET	List directory (`?path=services`)

3. LLM-Aware Responses

When a user asks a question that could be answered by tomb (e.g. "what port does Ollama run on?"), the agent should:

Search tomb for relevant files
Inject matching content into the LLM context
Answer with source citations ([services/Ollama.md])

This is the key "librarian" behavior — not just file serving, but answering questions using tomb as a knowledge base.

4. Tomb Path Config

TOMB_DIR env var in the systemd service (default: /home/az/tomb)
Read at startup, validate the directory exists
Reject paths that escape TOMB_DIR (path traversal protection)

5. Search Implementation

Use Python re + file walking for text search (no dependencies)
Search filename + first 200 chars of content for quick hits
Return top 5 results with snippet context
Optional: use sqlite3 FTS5 if tomb gets a search index

Files to Modify

File	Change
`/home/az/agent/server.py`	Add tomb commands, API endpoints, LLM context injection
`/etc/systemd/system/pi-agent.service`	Add `TOMB_DIR=/home/az/tomb` env var
`tomb/services/Pi-Agent.md`	Document new librarian commands and endpoints

Constraints

Must work with qwen3.5:0.8b (small model, limited context) — keep injected context concise
No new pip dependencies — use stdlib only (re, os, sqlite3)
Path traversal protection on all file reads
Tomb files are markdown — return raw, let the client render

Success Criteria

!tomb services/Ollama.md returns file contents
!tomb-search ollama gpu returns relevant results with snippets
API endpoints work: /api/tomb/read, /api/tomb/search, /api/tomb/index, /api/tomb/ls
Agent auto-searches tomb when questions match vault content
Path traversal blocked (e.g. !tomb ../../etc/passwd)
Works within qwen3.5:0.8b context limits
pi-agent.service updated with TOMB_DIR

## Goal Extend Pi Agent (Flask, port 5501, `/home/az/agent/server.py`) to serve and search tomb files. Pi becomes the librarian — always-on, answers questions about the vault. ## Context - Pi Agent already runs as `pi-agent.service` on Pi (192.168.0.237:5501) - qwen3.5:0.8b via local Ollama, Flask server with `!` system commands - Tomb will live at `/home/az/tomb/` on Pi (see #13) - Tomb is an Obsidian vault with markdown files, scripts, SQLite DBs ## Scope ### 1. Tomb Reader Commands New `!` commands for the agent: | Command | What it does | |---------|-------------| | `!tomb <filename>` | Read a tomb file and return contents (e.g. `!tomb services/Ollama.md`) | | `!tomb-search <query>` | Full-text search across all tomb markdown files (grep-based, fast) | | `!tomb-index` | Return the tomb-index.md manifest | | `!tomb-ls <path>` | List files in a tomb subdirectory (e.g. `!tomb-ls services/`) | ### 2. API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/tomb/read` | GET | Read a tomb file by path (`?path=services/Ollama.md`) | | `/api/tomb/search` | GET | FTS across tomb (`?q=ollama+gpu`) | | `/api/tomb/index` | GET | Return tomb manifest | | `/api/tomb/ls` | GET | List directory (`?path=services`) | ### 3. LLM-Aware Responses When a user asks a question that could be answered by tomb (e.g. "what port does Ollama run on?"), the agent should: 1. Search tomb for relevant files 2. Inject matching content into the LLM context 3. Answer with source citations (`[services/Ollama.md]`) This is the key "librarian" behavior — not just file serving, but **answering questions using tomb as a knowledge base**. ### 4. Tomb Path Config - `TOMB_DIR` env var in the systemd service (default: `/home/az/tomb`) - Read at startup, validate the directory exists - Reject paths that escape TOMB_DIR (path traversal protection) ### 5. Search Implementation - Use Python `re` + file walking for text search (no dependencies) - Search filename + first 200 chars of content for quick hits - Return top 5 results with snippet context - Optional: use `sqlite3` FTS5 if tomb gets a search index ## Files to Modify | File | Change | |------|--------| | `/home/az/agent/server.py` | Add tomb commands, API endpoints, LLM context injection | | `/etc/systemd/system/pi-agent.service` | Add `TOMB_DIR=/home/az/tomb` env var | | `tomb/services/Pi-Agent.md` | Document new librarian commands and endpoints | ## Constraints - Must work with qwen3.5:0.8b (small model, limited context) — keep injected context concise - No new pip dependencies — use stdlib only (re, os, sqlite3) - Path traversal protection on all file reads - Tomb files are markdown — return raw, let the client render ## Success Criteria - [ ] `!tomb services/Ollama.md` returns file contents - [ ] `!tomb-search ollama gpu` returns relevant results with snippets - [ ] API endpoints work: `/api/tomb/read`, `/api/tomb/search`, `/api/tomb/index`, `/api/tomb/ls` - [ ] Agent auto-searches tomb when questions match vault content - [ ] Path traversal blocked (e.g. `!tomb ../../etc/passwd`) - [ ] Works within qwen3.5:0.8b context limits - [ ] `pi-agent.service` updated with TOMB_DIR

forgejo-actions commented

2026-06-05 04:22:01 +00:00

Claude — Clarifying Questions

Good, I've read through the codebase. Here are my clarifying questions:

1. How should the auto-search trigger work with a 0.8b model?

The issue says the agent should auto-search tomb "when a question could be answered by tomb." But qwen3.5:0.8b is very small — asking it to decide whether to search adds latency and may produce unreliable results. There are a few options:

Always search — every non-! message hits tomb, append top results to context (simplest, but adds latency to every query)
Keyword matching — Python-side heuristic checks if the message contains tomb-relevant terms before searching
LLM decides — add a "should I search tomb?" step to the LLM call (adds a full round-trip)
Opt-in only — no auto-search; users explicitly use !tomb-search when they want vault content

Which approach do you have in mind?

2. What's the hard limit on injected context size?

The current config is num_predict: 512 with 8-message history. If auto-search injects tomb content into the system/user message, it could easily consume most of the context window. Should I enforce a hard character/byte limit on injected snippets (e.g., top 3 results × 200 chars each)? Or do you want the full file content injected when a match is found?

3. Does tomb-index.md exist yet, or should it be generated?

The issue references !tomb-index returning "the tomb-index.md manifest," but there's no tomb-index.md in /home/az/tomb/ currently. Is this file expected to be created by issue #13 (the tomb sync), or should the librarian feature auto-generate it by walking the directory?

4. Should tomb commands work from the streaming endpoint too?

Currently /api/stream doesn't handle ! commands at all — they're only processed in /api/chat. The issue doesn't mention streaming. Should I keep tomb commands as non-streaming (returning JSON immediately like existing ! commands), or also wire them into the SSE stream?

Please answer these questions so I can create an implementation plan.

## Claude — Clarifying Questions Good, I've read through the codebase. Here are my clarifying questions: --- **1. How should the auto-search trigger work with a 0.8b model?** The issue says the agent should auto-search tomb "when a question could be answered by tomb." But `qwen3.5:0.8b` is very small — asking it to decide *whether* to search adds latency and may produce unreliable results. There are a few options: - **Always search** — every non-`!` message hits tomb, append top results to context (simplest, but adds latency to every query) - **Keyword matching** — Python-side heuristic checks if the message contains tomb-relevant terms before searching - **LLM decides** — add a "should I search tomb?" step to the LLM call (adds a full round-trip) - **Opt-in only** — no auto-search; users explicitly use `!tomb-search` when they want vault content Which approach do you have in mind? **2. What's the hard limit on injected context size?** The current config is `num_predict: 512` with 8-message history. If auto-search injects tomb content into the system/user message, it could easily consume most of the context window. Should I enforce a hard character/byte limit on injected snippets (e.g., top 3 results × 200 chars each)? Or do you want the full file content injected when a match is found? **3. Does `tomb-index.md` exist yet, or should it be generated?** The issue references `!tomb-index` returning "the tomb-index.md manifest," but there's no `tomb-index.md` in `/home/az/tomb/` currently. Is this file expected to be created by issue #13 (the tomb sync), or should the librarian feature auto-generate it by walking the directory? **4. Should tomb commands work from the streaming endpoint too?** Currently `/api/stream` doesn't handle `!` commands at all — they're only processed in `/api/chat`. The issue doesn't mention streaming. Should I keep tomb commands as non-streaming (returning JSON immediately like existing `!` commands), or also wire them into the SSE stream? --- *Please answer these questions so I can create an implementation plan.*

forgejo-actions added the

awaiting-clarification

label

2026-06-05 04:22:02 +00:00

aaron commented

2026-06-05 04:58:34 +00:00

Author

Owner

approved

aaron added

awaiting-approval

and removed

awaiting-clarification

labels

2026-06-05 04:58:39 +00:00

forgejo-actions commented

2026-06-05 05:13:07 +00:00

Claude — Implementation Started \nBranch: `claude/issue-14-41`\n\nReading issue and codebase...

## Claude — Implementation Started \nBranch: `claude/issue-14-41`\n\nReading issue and codebase...

aaron removed the

awaiting-approval

label

2026-06-05 05:30:54 +00:00

aaron commented

2026-06-05 05:31:03 +00:00

Author

Owner

approved

aaron commented

2026-06-05 05:31:30 +00:00

Author

Owner

approved

aaron commented

2026-06-05 05:31:38 +00:00

Author

Owner

approved

aaron commented

2026-06-05 05:31:59 +00:00

Author

Owner

approved

aaron commented

2026-06-05 05:34:51 +00:00

Author

Owner

Answer

Always search — every non-! message searches tomb and injects top 3 results into context. The 0.8b model is too small to reliably decide when to search, so don't ask it. Just always append tomb context.

Keep it fast:

Search is just grep -ril across .md files — milliseconds
Only inject top 3 results, max 500 chars each
If no results found, skip injection entirely (zero overhead)
Total added context: ~1500 chars max — well within 0.8b's window

This means every chat response gets tomb context for free, and the model just naturally uses it when relevant.

## Answer **Always search** — every non-`!` message searches tomb and injects top 3 results into context. The 0.8b model is too small to reliably decide when to search, so don't ask it. Just always append tomb context. Keep it fast: - Search is just `grep -ril` across `.md` files — milliseconds - Only inject top 3 results, max 500 chars each - If no results found, skip injection entirely (zero overhead) - Total added context: ~1500 chars max — well within 0.8b's window This means every chat response gets tomb context for free, and the model just naturally uses it when relevant.

aaron commented

2026-06-05 05:36:41 +00:00

Author

Owner

Answers to All 4 Questions

1. Auto-search trigger: Always search

Always search — every non-! message hits tomb, append top results to context.

The 0.8b model is too small to reliably decide when to search, so don't ask it. A Python-side keyword heuristic is fragile and will miss things. An extra LLM round-trip doubles latency on a slow model.

The search is just walking .md files with re.search — it's milliseconds on Pi. No perceptible latency. If no results found, inject nothing (zero overhead). Simple and reliable.

2. Hard limit: top 3 results, 500 chars each, 1500 total

Max 3 results per query
Max 500 characters per snippet (first 500 chars of matching content)
Total injected: ~1500 chars max — well within 0.8b's context alongside 8-message history
Return the file path as a citation tag: [services/Ollama.md]
Never inject full file content into LLM context — too large. Full content is only for !tomb <file> (direct read, returned as the response, not injected into LLM)

3. tomb-index.md: Generate it dynamically

Don't rely on a static file. The librarian should auto-generate the index by walking the directory tree at startup (or on first !tomb-index call, cached thereafter). Walk TOMB_DIR, list dirs and files with one-line descriptions parsed from each .md file's first # heading or first line. Cache it — refresh when !tomb-index --refresh is called or on server restart.

Issue #13 is syncing the actual tomb from Rog which already has a tomb-index.md — so once #13 lands, that file will exist on Pi. But the librarian should still generate its own index dynamically as a fallback, not depend on a hand-maintained file.

4. Streaming: Keep tomb commands non-streaming

Keep !tomb* commands as non-streaming only — they return structured JSON data immediately (like existing ! commands). They don't need streaming.

The auto-search context injection works the same way for both endpoints — it happens before the LLM call, so it's transparent to whether the response streams or not.

Only wire auto-search into /api/chat. /api/stream can get it later if needed — keep scope small for now.

## Answers to All 4 Questions ### 1. Auto-search trigger: Always search **Always search** — every non-`!` message hits tomb, append top results to context. The 0.8b model is too small to reliably decide when to search, so don't ask it. A Python-side keyword heuristic is fragile and will miss things. An extra LLM round-trip doubles latency on a slow model. The search is just walking `.md` files with `re.search` — it's milliseconds on Pi. No perceptible latency. If no results found, inject nothing (zero overhead). Simple and reliable. ### 2. Hard limit: top 3 results, 500 chars each, 1500 total - Max **3 results** per query - Max **500 characters per snippet** (first 500 chars of matching content) - **Total injected: ~1500 chars max** — well within 0.8b's context alongside 8-message history - Return the file path as a citation tag: `[services/Ollama.md]` - **Never inject full file content** into LLM context — too large. Full content is only for `!tomb <file>` (direct read, returned as the response, not injected into LLM) ### 3. tomb-index.md: Generate it dynamically Don't rely on a static file. The librarian should auto-generate the index by walking the directory tree at startup (or on first `!tomb-index` call, cached thereafter). Walk `TOMB_DIR`, list dirs and files with one-line descriptions parsed from each `.md` file's first `# heading` or first line. Cache it — refresh when `!tomb-index --refresh` is called or on server restart. Issue #13 is syncing the actual tomb from Rog which already has a `tomb-index.md` — so once #13 lands, that file will exist on Pi. But the librarian should still generate its own index dynamically as a fallback, not depend on a hand-maintained file. ### 4. Streaming: Keep tomb commands non-streaming Keep `!tomb*` commands as **non-streaming only** — they return structured JSON data immediately (like existing `!` commands). They don't need streaming. The auto-search context injection works the same way for both endpoints — it happens before the LLM call, so it's transparent to whether the response streams or not. Only wire auto-search into `/api/chat`. `/api/stream` can get it later if needed — keep scope small for now.

Rows
Columns

Pi Agent: Tomb Librarian mode #14

Goal

Context

Scope

1. Tomb Reader Commands

2. API Endpoints

3. LLM-Aware Responses

4. Tomb Path Config

5. Search Implementation

Files to Modify

Constraints

Success Criteria

Claude — Clarifying Questions

Claude — Implementation Started \nBranch: claude/issue-14-41\n\nReading issue and codebase...

Answer

Answers to All 4 Questions

1. Auto-search trigger: Always search

2. Hard limit: top 3 results, 500 chars each, 1500 total

3. tomb-index.md: Generate it dynamically

4. Streaming: Keep tomb commands non-streaming

Claude — Implementation Started \nBranch: `claude/issue-14-41`\n\nReading issue and codebase...