AI SummaryA system prompt for benchmarking AI agents on realistic task execution, specifically designed to simulate a user managing job applications through Notion. Best suited for evaluating agentic capabilities across multiple platforms (Claude, ChatGPT, Cursor, Windsurf).
Description
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Install
# Download system prompt curl -o SYSTEM_PROMPT.md "https://raw.githubusercontent.com/hkust-nlp/Toolathlon/main/tasks/finalpool/notion-find-job/docs/user_system_prompt.md"
Quality Score
B
Good
76/100
Standard Compliance72
Documentation Quality65
Usefulness58
Maintenance Signal100
Community Signal100
Scored Yesterday
Trust & Transparency
No License Detected
Review source code before installing
Verified Open Source
Hosted on GitHub — publicly auditable
Actively Maintained
Last commit 3d ago
231 stars — Growing Community
25 forks
My Fox Den
Community Rating
Works With
Claude Code
claude_desktop
Cursor
Windsurf
ChatGPT