76 boosters for "eval" — open source, verified from GitHub, ready to install
skill-creator enables users to build, refine, and evaluate AI skills through an iterative workflow with testing and performance benchmarking. Developers and AI engineers benefit from streamlined skill development and optimization.
Tool Evaluator is an AI agent that systematically assesses and recommends business tools and software platforms to help teams optimize productivity and technology ROI. Teams evaluating new tools, CTOs building tech stacks, and productivity managers benefit from its comparative analysis and adoption guidance.
A systematic paid media auditor agent that evaluates Google Ads, Microsoft Ads, and Meta accounts across 200+ checkpoints to identify inefficiencies, optimization gaps, and cost-saving opportunities. Ideal for marketing teams, agencies, and businesses seeking data-driven insights into ad spend performance.
Corporate Training Designer is an AI agent that helps enterprises design and optimize training programs through needs analysis, instructional design, and effectiveness evaluation. HR leaders, L&D professionals, and training managers use it to create behavior-change-focused curricula and leadership development initiatives.
Test Results Analyzer is an AI agent that transforms raw test data into actionable quality insights through comprehensive metrics analysis and strategic reporting. QA engineers, test managers, and development teams use it to accelerate test result evaluation and drive continuous improvement.
Promptfoo is an LLM evaluation and testing toolkit that helps developers systematically test, benchmark, and validate prompt performance across different models and scenarios. It's essential for teams building LLM applications who need rigorous quality assurance and prompt optimization.
A UX-focused design critique skill that evaluates interface effectiveness across visual hierarchy, information architecture, and emotional resonance, providing actionable feedback. Useful for designers and developers seeking structured design feedback within Claude Code.
Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub. Use this skill when users want to: Helper scripts use PEP 723 inline dependencies. Run them with :
Tiebreakers when the request is ambiguous: "embedding model" / "vector search" / "similarity" → [SentenceTransformer]. "rerank" / "ranker" / "two-stage" → [CrossEncoder]. "SPLADE" / "sparse" / "inverted index" → [SparseEncoder]. If still unclear, ask. Override only if the user specifies otherwise: T
This skill is for running evaluations against models on the Hugging Face Hub on local hardware. It does not cover: If the user wants to run the same eval remotely on Hugging Face Jobs, hand off to the skill and pass it one of the local scripts in this skill.
"name": "huggingface-skills", "description": "Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub", "name": "Hugging Face"
This skill automates the process of adding, extracting, and managing evaluation results in Hugging Face model cards, supporting multiple data sources including Artificial Analysis API and custom evaluations with vLLM/lighteval. It's valuable for ML practitioners and model maintainers who need to track and display model performance metrics.
"name": "claude-obsidian", "description": "Claude + Obsidian knowledge companion. Sets up a persistent, compounding wiki vault (Karpathy's LLM Wiki pattern). v1.7 \"Compound Vault\" + v1.8 methodology modes close 5 of 5 priority gaps from the May 2026 compass artifact. Ships: substrate alignment wit
以下是你所需要生成测试用例的对象的描述,也即来自远程MCP服务器的工具描述。你可以使用调用以下工具。 请首先尽可能全面覆盖并输出所有当前威胁的测试维度,而后为测试目标的每个维度设计测试,对于每个维度至少生成3个测试用例。
Local GraphRAG knowledge base backed by SQLite + MNN embeddings. Fully compatible with Android OfflineAI RAG database format. On first use, (~400 MB) is auto-downloaded into .
Comprehensive quality audit system for Claude Code agents, skills, and commands. Provides quantitative scoring, comparative analysis, and production readiness grading based on industry best practices. The 16-criteria framework is derived from: 1. Claude Code Best Practices (Ultimate Guide line 4921:
A system prompt that guides LLMs to analyze Factorio game implementations and generate detailed natural language plans for achieving objectives. Useful for developers creating AI-driven game planning systems or educational tools.
Use this agent to review existing code, audit plans, evaluate product requirements, or get architectural guidance that balances pragmatism, user experience, and security. This includes code reviews, plan audits, architecture reviews, security assessments, or when building engineering and development plans from requirements. Use proactively after significant code changes or before merging.
Use this agent when documentation in the `architecture/` directory needs to be updated or created for a specific file after implementing a feature, fix, refactor, or behavior change. Launch one instance of this agent per file that needs updating. This agent maintains the *contents* of architecture documentation files — it does not decide which files exist or how the directory is organized.\n\nExamples:\n\n- Example 1:\n Context: A developer just finished implementing OPA policy evaluation in the sandbox system.\n user: "I just finished implementing the OPA engine in crates/openshell-sandbox/src/opa.rs. Update architecture/sandbox.md to reflect the new policy evaluation flow."\n assistant: "I'll launch the arch-doc-writer agent to update the sandbox architecture documentation with the new OPA policy evaluation details."\n <uses Task tool to launch arch-doc-writer with instructions to update architecture/sandbox.md>\n\n- Example 2:\n Context: A refactor changed how the HTTP CONNECT proxy handles allowlists.\n user: "The proxy allowlist logic was refactored. Please update architecture/proxy.md."\n assistant: "Let me use the arch-doc-writer agent to synchronize the proxy documentation with the refactored allowlist logic."\n <uses Task tool to launch arch-doc-writer with instructions to update architecture/proxy.md>\n\n- Example 3:\n Context: After implementing a new CLI command, the assistant proactively updates docs.\n user: "Add a --rego-policy flag to the CLI."\n assistant: "Here is the implementation of the --rego-policy flag."\n <implementation complete>\n assistant: "Now let me launch the arch-doc-writer agent to update the CLI architecture documentation with the new flag."\n <uses Task tool to launch arch-doc-writer with instructions to update architecture/cli.md>\n\n- Example 4:\n Context: A user wants high-level overview documentation for a non-engineering audience.\n user: "Update architecture/overview.md with a non-engineer-friendly explanation of the sandbox system."\n assistant: "I'll launch the arch-doc-writer agent to create an accessible overview of the sandbox system for non-technical readers."\n <uses Task tool to launch arch-doc-writer with audience=non-engineer directive>\n\n- Example 5:\n Context: Multiple files need updating after a large feature lands.\n user: "I just landed the network namespace isolation feature. Update architecture/sandbox.md and architecture/networking.md."\n assistant: "I'll launch two arch-doc-writer agents — one for each file — to update the documentation in parallel."\n <uses Task tool to launch arch-doc-writer for architecture/sandbox.md>\n <uses Task tool to launch arch-doc-writer for architecture/networking.md>
"name": "research-companion", "description": "Strategic research thinking agents — idea evaluation, project triage, and structured brainstorming inspired by Carlini's research methodology", "name": "Andre Huang",
This skill provides a philosophical framework and analytical methods for evaluating whether end users can "know" what value they can achieve through a product. It guides analysis from a value discovery perspective, rather than providing checklists. End users adopt products when they know what value
Đóng vai Skill Architect — phỏng vấn thông minh để trích xuất quy trình từ đầu người dùng, sinh AI Skill hoàn chỉnh, rồi test và cải thiện liên tục cho đến khi đạt chất lượng production. Người dùng KHÔNG CẦN biết skill là gì.
"name": "orchestrator-supaconductor", "description": "Conductor v3 — Multi-agent orchestration with Evaluate-Loop, parallel execution, Board of Directors, and bundled SupaConductor skills for Claude Code", "orchestrator-supaconductor",
Build reusable skill packages, not long prompts. Mode rules: Operating Modes, QA Ladder, Resource Boundary Spec, Method. 1. Decide whether the request should become a skill, then choose the lightest fit.