AI SummaryThis skill automates the process of adding, extracting, and managing evaluation results in Hugging Face model cards, supporting multiple data sources including Artificial Analysis API and custom evaluations with vLLM/lighteval. It's valuable for ML practitioners and model maintainers who need to track and display model performance metrics.
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "hugging-face-evaluation" skill in my project. Please run this command in my terminal: # Install skill into the correct directory (13 files) mkdir -p .claude/skills/hugging-face-evaluation && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/SKILL.md" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/example_readme_tables.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/example_readme_tables.md" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/metric_mapping.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/metric_mapping.json" && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/requirements.txt "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/requirements.txt" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/evaluation_manager.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/evaluation_manager.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/run_eval_job.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/run_eval_job.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/test_extraction.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/test_extraction.py" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
Overview
This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data: • Extracting existing evaluation tables from README content • Importing benchmark scores from Artificial Analysis • Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai)
Features
• vLLM Backend: High-performance GPU inference (5-10x faster than standard HF methods) • lighteval Framework: HuggingFace's evaluation library with Open LLM Leaderboard tasks • inspect-ai Framework: UK AI Safety Institute's evaluation library • Standalone or Jobs: Run locally or submit to HF Jobs infrastructure
Usage Instructions
The skill includes Python scripts in scripts/ to perform operations.
Prerequisites
• Preferred: use uv run (PEP 723 header auto-installs deps) • Or install manually: pip install huggingface-hub markdown-it-py python-dotenv pyyaml requests • Set HF_TOKEN environment variable with Write-access token • For Artificial Analysis: Set AA_API_KEY environment variable • .env is loaded automatically if python-dotenv is installed
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster