Explore AI Boosters

5 boosters for "llm-evaluation" — open source, verified from GitHub, ready to install

975 Skills461 Plugins635 Agents839 MCP Servers1330 Prompts

Clear filters

Active:

"llm-evaluation"

MCP Server

Promptfoo MCP Server

Promptfoo is an LLM evaluation and testing toolkit that helps developers systematically test, benchmark, and validate prompt performance across different models and scenarios. It's essential for teams building LLM applications who need rigorous quality assurance and prompt optimization.

by promptfoo

cici-cd

16.1k

1.4k

CDCC

Skill

sprint-review

Generate a markdown changelog from GitHub PRs for sprint review meetings. Both can be overridden if the user explicitly provides a different author or date. 1. Detect the current user (unless explicitly provided):

by rhesis-ai

generative-aillm-evaluation

29621

Skill

brain-in-the-fish

Brain in the Fish evaluates documents (essays, policies, contracts, clinical reports, surveys) against evaluation criteria using a panel of AI agents. Each agent's mental state exists as OWL ontology. Scoring is grounded in an Evidence Density Scorer (EDS) that makes hallucination mathematically det

Custom Agents

PrismBench enables developers to create specialized LLM agents through YAML configuration for systematic evaluation of model capabilities using Monte Carlo Tree Search. Useful for ML engineers, researchers, and teams building production LLM systems who need comprehensive benchmarking and evaluation frameworks.

by PrismBench

automated-testingbenchmarking

CCCD

Agent

Custom Agents

PrismBench enables developers to create specialized LLM agents through YAML configuration for comprehensive benchmarking and evaluation of language model capabilities. Teams building AI evaluation systems and ML testing pipelines benefit from its systematic Monte Carlo Tree Search approach and containerized deployment.

by CommissarSilver

automated-testingbenchmarking

CCCD