ImAiFox
ImAiFox
ExploreStacksCategories
Search boosters…
Open Source First·AI-Graded A–F·Updated Hourly·4 Booster Types
ImAiFoxImAiFox

© 2026 ImAiFox · Your AI Superpowers, Curated.

ExploreStacksCategories

Explore AI Boosters

5 boosters for "benchmarking" — AI-graded, open source, ready to install

409 Skills428 Agents500 MCP Servers1031 Prompts
Clear filters
Active:
"benchmarking"
Prompt
B

Toolathlon — System Prompt

A system prompt for benchmarking AI agents on realistic task execution, specifically designed to simulate a user managing job applications through Notion. Best suited for evaluating agentic capabilities across multiple platforms (Claude, ChatGPT, Cursor, Windsurf).

by hkust-nlp
allrules
23125
CCCDCuWS
Prompt
D

ArmBench-LLM — System Prompt

ArmBench-LLM is a system prompt for benchmarking large language models using Armenian character-to-numeric matching tasks. It's designed for developers evaluating LLM performance across multiple coding platforms.

by Metricam
allrules
6
CCCDCuWS
Prompt
D

ArmBench-LLM — System Prompt

ArmBench-LLM is a system prompt framework for evaluating large language models on Armenian language tasks through structured multiple-choice questions. It's designed for developers and AI researchers who need standardized benchmarking tools across popular coding assistants and chat platforms.

by Metricam
allrules
6
CCCDCuWS
Agent
C

Custom Agents

PrismBench enables developers to create specialized LLM agents through YAML configuration for systematic evaluation of model capabilities using Monte Carlo Tree Search. Useful for ML engineers, researchers, and teams building production LLM systems who need comprehensive benchmarking and evaluation frameworks.

by PrismBench
automated-testingbenchmarking
3
CCCD
Prompt
D

AlgoClash-Where-Code-Collides — System Prompt

AlgoClash is a competitive platform where developers build and deploy autonomous AI trading agents that battle in simulated stock markets with live leaderboards and backtesting. It's useful for ML/AI engineers interested in algorithmic trading, agent design, and competitive benchmarking.

by Jaiminp007
allrules
2
CCCDCuWS