76 boosters for "eval" — open source, verified from GitHub, ready to install
An MCP server for conversation history search and retrieval in Claude Code
SWR is a React hook library for efficient data fetching with built-in caching, revalidation, and real-time updates. Developers building API-driven applications benefit from simplified server state management and automatic synchronization.
"name": "prism-mcp-server", "mcpName": "io.github.dcostenco/prism-mcp", "description": "The Mind Palace for AI Agents — persistent memory (SQLite/Supabase), behavioral learning & IDE rules sync, multimodal VLM image captioning, pluggable LLM providers (OpenAI/Anthropic/Gemini/Ollama), OpenTelemetry
This skill enables developers to save and retrieve Git changes across Claude Code sessions by linking stash entries to session IDs, maintaining continuity when resuming work. It benefits developers who work on code iteratively across multiple conversations and need to preserve work-in-progress state.
"name": "double-shot-latte", "description": "Automatically evaluates whether Claude should continue working instead of stopping prematurely using Claude-judged decision making", "url": "https://github.com/anthropics"
Brain in the Fish evaluates documents (essays, policies, contracts, clinical reports, surveys) against evaluation criteria using a panel of AI agents. Each agent's mental state exists as OWL ontology. Scoring is grounded in an Evidence Density Scorer (EDS) that makes hallucination mathematically det
You are tasked with retrieving relevant knowledge from the Obsidian vault using multi-layer semantic search. 1. First Layer - Initial Search: 2. Second Layer - Direct Associations:
Evaluate the user's ambient context artifacts for compatibility with swarm's governance rules. You are a read-only diagnostic — never modify any files. 1. CLAUDE.md files. Read the project's (working directory root). If exists, read that too. Also check (global config) — it loads into every sessi
is an eval workbench for agent skills. It runs a model in an isolated Docker directory, provides skills/references as normal workspace files, captures an agent trace, and grades deterministic local outcomes. Use this skill as the source of truth for authoring eval suites in this repo. Detailed sche
Use AskUserQuestion to ask the buyer: Tell the user the version was updated, then re-read the EVALUATION.md file from the updated directory and proceed with the skill. After the preamble, read the full evaluation methodology:
Two tiers: hooks handle automatic context flow (surfacing, extraction, compaction survival). MCP tools handle explicit recall, write, and lifecycle operations. Three instances for neural inference. The wrapper defaults to . All three models auto-download via if no server is running (Metal on Appl
Turn social media paper recommendations into actionable research items. Use platform-specific tools to fetch the full content: From the extracted content, identify all referenced papers:
Turn a folder of raw files into a Markdown vault that an LLM can grep, and then answer questions over that vault responsibly. source file, carrying retrieval frontmatter (abstract / tags / synonyms) + a
1. 架构不是「画」出来的,是从约束里「逼」出来的。 没搞清约束就画图,画什么都是瞎画。 2. 没有银弹,只有取舍。 任何决策本质都是「用 A 换 B」。一个「没有缺点」的方案,不是完美,是没想清楚。 3. 没有「最好的架构」,只有「在这组约束下最合适的架构」。 同样是聊天,内部工具和微信的答案天差地别。
"name": "phdtaketaketake", "description": "Connection-first PhD advisor matcher — finds the right advisor by network strength, not h-index. Evidence-first: every signal traces to a real source the agent fetched. Best-supported for physics / MSE; extensible to other STEM with field-specific caveats."
An agent designed to evaluate other agents and tasks, with library-first constraints and multi-tool integration across Claude platforms. Useful for teams building quality assurance workflows into their Claude-based systems.
RagCode MCP is a semantic code navigation tool that integrates RAG-powered code search into Windsurf and other IDEs, enabling developers to intelligently query and understand multi-language codebases using local LLMs. It's ideal for developers working with Laravel, Go, Python, and PHP who need fast, context-aware code exploration without leaving their IDE.
"description": "Smart command safety filter for Claude Code — parses shell pipelines and evaluates per-command safety rules to auto-approve safe commands and block dangerous ones",
AgentAsJudge is an agentic evaluation framework that enables AI systems to critically review educational introductions by validating them against specified quality metrics and providing constructive feedback. It benefits educators, instructional designers, and developers building AI-assisted learning platforms who need reliable, fair assessment of educational content.
AgentAsJudge is an agentic evaluation framework that enables AI to systematically assess and compare the quality of multiple-choice questions across educational value, clarity, and answerability. It benefits educators, content creators, and assessment teams looking to automate quality control of exam and quiz questions.
"version": "5.10.0", "description": "Memory → Evaluation → Credential → Access Control for AI agents. Persistent memory with W3C Verifiable Credentials, capability-based access control, drift detection, and FSRS-6 spaced repetition.", "name": "kobie3717",
"description": "Persistent memory for Claude Code — remembers across sessions automatically. Install and forget. Scientific retrieval backed by 41 published papers.", "name": "Clement Deust", "email": "admin@ai-architect.tools"
"name": "vibe-science", "description": "Scientific research plugin with tracked claim/review/seed lifecycle, citation verification gates, strict integrity, benchmark recording, and retrieval closure.", "name": "Vibe Science Contributors",
"name": "open-academic-paper-machine", "description": "Open Academic Paper Machine — Autonomous academic paper production system with idea evaluation gate and paper-vs-code audit. NEW in v6.4: /audit-paper command and audit-engine skill — static audit of a paper's empirical claims (datasets, models,