76 boosters for "eval" — open source, verified from GitHub, ready to install
A development guide for extending PatientHub with new patient simulation agents, enabling researchers and developers to implement custom behavioral models for healthcare simulations.
"name": "cre-skills", "description": "112 institutional-grade CRE skills covering ~97% of commercial real estate workflow steps. Deal screening, underwriting, structuring, due diligence, capital markets, market research, asset management, leasing, investor relations, development, disposition, sourci
The architect agent automates system design for new projects and major refactoring efforts, helping teams create scalable architectures with documented trade-offs. Ideal for engineering teams starting greenfield projects or evaluating architectural changes.
"name": "io-skills", "name": "OpenMatter-Network" "description": "Modular skills for Industrial-Organizational psychologists: personnel selection validation and the evaluation/audit of AI-based assessment, derived from authoritative professional standards.",
ArmBench-LLM is a system prompt framework for evaluating large language models on Armenian language tasks through structured multiple-choice questions. It's designed for developers and AI researchers who need standardized benchmarking tools across popular coding assistants and chat platforms.
ArmBench-LLM is a system prompt for benchmarking large language models using Armenian character-to-numeric matching tasks. It's designed for developers evaluating LLM performance across multiple coding platforms.
Tool-evaluator is an agent that rapidly assesses development tools, frameworks, and services through structured benchmarking and comparative analysis to support informed technology adoption decisions. It benefits engineering teams and tech leads evaluating new solutions aligned with studio goals.
A debate judge agent that objectively evaluates arguments using zero-sum scoring across Toulmin structure, evidence strength, and logical rigor. Ideal for researchers, educators, and developers building computational debate systems.
"name": "rag-skills", "description": "Agent skills for RAG (Retrieval Augmented Generation): chunking strategies (sliding window, semantic, hierarchical), retrieval strategies (HyDE, CRAG, Self-RAG, Graph RAG, adaptive, multi-pass), vector database setup (Qdrant), data type handling (code, multimoda
skill-auditor is an expert reviewer that evaluates SKILL.md files against Claude Code Skills best practices, helping developers ensure their skills meet structural and effectiveness standards. It's essential for skill creators and maintainers who want to validate compliance before publishing.
PrismBench enables developers to create specialized LLM agents through YAML configuration for systematic evaluation of model capabilities using Monte Carlo Tree Search. Useful for ML engineers, researchers, and teams building production LLM systems who need comprehensive benchmarking and evaluation frameworks.
PrismBench enables developers to create specialized LLM agents through YAML configuration for comprehensive benchmarking and evaluation of language model capabilities. Teams building AI evaluation systems and ML testing pipelines benefit from its systematic Monte Carlo Tree Search approach and containerized deployment.
This skill enables developers to create cryptographically signed, immutable constitutions for AI tool-use governance in OpenClaw, with Ed25519 signing, GitTruth attestation, and policy evaluation artifacts. It's designed for teams implementing constitutional governance frameworks for AI agents.
SkillGuard is a security reviewer for Claude/Cursor Skills that detects prompt injection, tool injection, data exfiltration, and unsafe automation risks. It's essential for developers and organizations installing or developing AI skills to ensure safe, policy-compliant code execution.
A Chief Technology Officer agent that guides enterprise technology strategy decisions, including investment evaluation, technical vision setting, and architectural planning. Ideal for organizations needing structured CTO-level guidance on technology roadmaps and innovation initiatives.
A Cursor-specific ruleset that enforces Python development standards using uv for package management and Pydantic v2, designed to ensure consistent tooling practices across AI-assisted coding workflows.
A specialized Copilot prompt that configures an AI agent as an expert assistant for building and deploying ASP.NET Core web APIs and Blazor WebAssembly apps to Google Cloud, with integrated validation and iterative problem-solving capabilities.
SDET is a skill that enables AI assistants to design and build comprehensive test automation infrastructure, including end-to-end tests, coverage analysis, and testing strategy. It benefits developers who need robust automated testing frameworks and QA engineers seeking to identify and close testing gaps.
Luna is a specialized UI/UX agent that helps developers design, review, and improve user interfaces through expert guidance on components, accessibility, responsive layouts, and user interaction patterns. It's ideal for developers building React applications who want professional feedback on their UI code and design decisions.
TheiaChat CLI Copilot Instructions is a comprehensive governance framework for an AI-assisted TypeScript terminal tool, designed to enforce enterprise-grade code quality, security, and architectural rigor through mandatory PRD-first workflows and triple-check verification gates. Enterprise teams and AI coding agents working on TypeScript CLI projects benefit from its structured approach to secure, validated feature development.
A production-grade system prompt for building a security-hardened RAG (Retrieval-Augmented Generation) document Q&A platform with JWT auth, multi-tenant isolation, and Gemini integration. Ideal for teams building enterprise-ready AI assistants that prioritize security and observability.
Spec-Judge evaluates and selects the best versions of requirement, design, and task specification documents based on comprehensive criteria like completeness, clarity, feasibility, and innovation. It helps development teams streamline spec development workflows by providing systematic evaluation and comparison of document versions.
Tara is a Design QA Agent that automates visual regression testing, cross-browser validation, accessibility audits, and responsive design evaluation for design teams. It benefits developers and QA professionals who need systematic, reusable testing across projects.
Enables cryptographically-signed constitutional governance for AI tool use, allowing teams to define immutable policy layers (constitution + signature) beneath mutable identity guidance. Ideal for organizations deploying autonomous agents requiring tamper-evident audit trails and policy enforcement.