AI SummaryTwo tiers: hooks handle automatic context flow (surfacing, extraction, compaction survival). MCP tools handle explicit recall, write, and lifecycle operations. Three instances for neural inference. The wrapper defaults to . All three models auto-download via if no server is running (Metal on Appl
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "clawmem" skill in my project. Please run this command in my terminal: # Install skill into your project mkdir -p .claude/skills/ClawMem && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/ClawMem/SKILL.md "https://raw.githubusercontent.com/yoloshii/ClawMem/main/SKILL.md" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
ClawMem agent reference — detailed operational guidance for the on-device hybrid memory system. Use when: setting up collections/indexing/embedding, troubleshooting retrieval, tuning query optimization (4 levers), understanding pipeline behavior, managing memory lifecycle (pin/snooze/forget), building graphs, or any ClawMem operation beyond basic tool routing.
Architecture
Two tiers: hooks handle automatic context flow (surfacing, extraction, compaction survival). MCP tools handle explicit recall, write, and lifecycle operations. ---
Inference Services
Three llama-server instances for neural inference. The bin/clawmem wrapper defaults to localhost:8088/8089/8090. Default (QMD native combo, any GPU or in-process): | Service | Port | Model | VRAM | Protocol | |---|---|---|---|---| | Embedding | 8088 | EmbeddingGemma-300M-Q8_0 | ~400MB | /v1/embeddings | | LLM | 8089 | qmd-query-expansion-1.7B-q4_k_m | ~2.2GB | /v1/chat/completions | | Reranker | 8090 | qwen3-reranker-0.6B-Q8_0 | ~1.3GB | /v1/rerank | All three models auto-download via node-llama-cpp if no server is running (Metal on Apple Silicon, Vulkan where available, CPU as last resort). Fast with GPU acceleration (Metal/Vulkan); significantly slower on CPU-only. SOTA upgrade (12GB+ GPU): zembed-1-Q4_K_M (embedding, 2560d, ~4.4GB) + zerank-2-Q4_K_M (reranker, ~3.3GB). Total ~10GB with LLM. Distillation-paired via zELO. -ub must match -b for both. CC-BY-NC-4.0 — non-commercial only. Remote option: Set CLAWMEM_EMBED_URL, CLAWMEM_LLM_URL, CLAWMEM_RERANK_URL to remote host. Set CLAWMEM_NO_LOCAL_MODELS=true to prevent fallback downloads. Cloud embedding: Set CLAWMEM_EMBED_API_KEY + CLAWMEM_EMBED_URL + CLAWMEM_EMBED_MODEL for cloud providers. Supported: Jina AI (jina-embeddings-v5-text-small, 1024d), OpenAI, Voyage, Cohere. Batch embedding, TPM-aware pacing, provider-specific params auto-detected.
Embedding (--embeddings flag required)
llama-server -m embeddinggemma-300M-Q8_0.gguf \ --embeddings --port 8088 --host 0.0.0.0 -ngl 99 -c 2048 --batch-size 2048
LLM (auto-downloads via node-llama-cpp if no server)
llama-server -m qmd-query-expansion-1.7B-q4_k_m.gguf \ --port 8089 --host 0.0.0.0 -ngl 99 -c 4096 --batch-size 512
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster