Pi Agent: Tomb Librarian mode #14
Labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
aaron/claude-ci-demo#14
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Extend Pi Agent (Flask, port 5501,
/home/az/agent/server.py) to serve and search tomb files. Pi becomes the librarian — always-on, answers questions about the vault.Context
pi-agent.serviceon Pi (192.168.0.237:5501)!system commands/home/az/tomb/on Pi (see #13)Scope
1. Tomb Reader Commands
New
!commands for the agent:!tomb <filename>!tomb services/Ollama.md)!tomb-search <query>!tomb-index!tomb-ls <path>!tomb-ls services/)2. API Endpoints
/api/tomb/read?path=services/Ollama.md)/api/tomb/search?q=ollama+gpu)/api/tomb/index/api/tomb/ls?path=services)3. LLM-Aware Responses
When a user asks a question that could be answered by tomb (e.g. "what port does Ollama run on?"), the agent should:
[services/Ollama.md])This is the key "librarian" behavior — not just file serving, but answering questions using tomb as a knowledge base.
4. Tomb Path Config
TOMB_DIRenv var in the systemd service (default:/home/az/tomb)5. Search Implementation
re+ file walking for text search (no dependencies)sqlite3FTS5 if tomb gets a search indexFiles to Modify
/home/az/agent/server.py/etc/systemd/system/pi-agent.serviceTOMB_DIR=/home/az/tombenv vartomb/services/Pi-Agent.mdConstraints
Success Criteria
!tomb services/Ollama.mdreturns file contents!tomb-search ollama gpureturns relevant results with snippets/api/tomb/read,/api/tomb/search,/api/tomb/index,/api/tomb/ls!tomb ../../etc/passwd)pi-agent.serviceupdated with TOMB_DIRClaude — Clarifying Questions
Good, I've read through the codebase. Here are my clarifying questions:
1. How should the auto-search trigger work with a 0.8b model?
The issue says the agent should auto-search tomb "when a question could be answered by tomb." But
qwen3.5:0.8bis very small — asking it to decide whether to search adds latency and may produce unreliable results. There are a few options:!message hits tomb, append top results to context (simplest, but adds latency to every query)!tomb-searchwhen they want vault contentWhich approach do you have in mind?
2. What's the hard limit on injected context size?
The current config is
num_predict: 512with 8-message history. If auto-search injects tomb content into the system/user message, it could easily consume most of the context window. Should I enforce a hard character/byte limit on injected snippets (e.g., top 3 results × 200 chars each)? Or do you want the full file content injected when a match is found?3. Does
tomb-index.mdexist yet, or should it be generated?The issue references
!tomb-indexreturning "the tomb-index.md manifest," but there's notomb-index.mdin/home/az/tomb/currently. Is this file expected to be created by issue #13 (the tomb sync), or should the librarian feature auto-generate it by walking the directory?4. Should tomb commands work from the streaming endpoint too?
Currently
/api/streamdoesn't handle!commands at all — they're only processed in/api/chat. The issue doesn't mention streaming. Should I keep tomb commands as non-streaming (returning JSON immediately like existing!commands), or also wire them into the SSE stream?Please answer these questions so I can create an implementation plan.
approved
Claude — Implementation Started \nBranch:
claude/issue-14-41\n\nReading issue and codebase...approved
approved
approved
approved
Answer
Always search — every non-
!message searches tomb and injects top 3 results into context. The 0.8b model is too small to reliably decide when to search, so don't ask it. Just always append tomb context.Keep it fast:
grep -rilacross.mdfiles — millisecondsThis means every chat response gets tomb context for free, and the model just naturally uses it when relevant.
Answers to All 4 Questions
1. Auto-search trigger: Always search
Always search — every non-
!message hits tomb, append top results to context.The 0.8b model is too small to reliably decide when to search, so don't ask it. A Python-side keyword heuristic is fragile and will miss things. An extra LLM round-trip doubles latency on a slow model.
The search is just walking
.mdfiles withre.search— it's milliseconds on Pi. No perceptible latency. If no results found, inject nothing (zero overhead). Simple and reliable.2. Hard limit: top 3 results, 500 chars each, 1500 total
[services/Ollama.md]!tomb <file>(direct read, returned as the response, not injected into LLM)3. tomb-index.md: Generate it dynamically
Don't rely on a static file. The librarian should auto-generate the index by walking the directory tree at startup (or on first
!tomb-indexcall, cached thereafter). WalkTOMB_DIR, list dirs and files with one-line descriptions parsed from each.mdfile's first# headingor first line. Cache it — refresh when!tomb-index --refreshis called or on server restart.Issue #13 is syncing the actual tomb from Rog which already has a
tomb-index.md— so once #13 lands, that file will exist on Pi. But the librarian should still generate its own index dynamically as a fallback, not depend on a hand-maintained file.4. Streaming: Keep tomb commands non-streaming
Keep
!tomb*commands as non-streaming only — they return structured JSON data immediately (like existing!commands). They don't need streaming.The auto-search context injection works the same way for both endpoints — it happens before the LLM call, so it's transparent to whether the response streams or not.
Only wire auto-search into
/api/chat./api/streamcan get it later if needed — keep scope small for now.