How do I install hugging-face-evaluation?

hugging-face-evaluation is a Skill hosted on GitHub at https://github.com/huggingface/skills. Visit the ImAiFox page at https://imaifox.com/boosters/huggingface-skills-hugging-face-evaluation for the AI-ready install prompt you can copy directly into Claude Code, Cursor, or Windsurf.

How popular is hugging-face-evaluation?

hugging-face-evaluation has 8,500 GitHub stars and 502 forks. The repository has not had recent commits.

Is hugging-face-evaluation free?

Yes — hugging-face-evaluation is open source and free to use under the Apache-2.0 license. The source code is publicly available on GitHub at https://github.com/huggingface/skills.

Skill

hugging-face-evaluation

Name: hugging-face-evaluation
Author: huggingface

by huggingface

AI Summary

This skill automates the process of adding, extracting, and managing evaluation results in Hugging Face model cards, supporting multiple data sources including Artificial Analysis API and custom evaluations with vLLM/lighteval. It's valuable for ML practitioners and model maintainers who need to track and display model performance metrics.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "hugging-face-evaluation" skill in my project.

Please run this command in my terminal:
# Install skill into the correct directory (13 files)
mkdir -p .claude/skills/hugging-face-evaluation && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/SKILL.md" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/example_readme_tables.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/example_readme_tables.md" && mkdir -p .claude/skills/hugging-face-evaluation/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/examples/metric_mapping.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/examples/metric_mapping.json" && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/requirements.txt "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/requirements.txt" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/evaluation_manager.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/evaluation_manager.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/run_eval_job.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/run_eval_job.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py" && mkdir -p .claude/skills/hugging-face-evaluation/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-evaluation/scripts/test_extraction.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/scripts/test_extraction.py"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

api llm

Description

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

Overview

This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data: • Extracting existing evaluation tables from README content • Importing benchmark scores from Artificial Analysis • Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai)

Features

• vLLM Backend: High-performance GPU inference (5-10x faster than standard HF methods) • lighteval Framework: HuggingFace's evaluation library with Open LLM Leaderboard tasks • inspect-ai Framework: UK AI Safety Institute's evaluation library • Standalone or Jobs: Run locally or submit to HF Jobs infrastructure

Usage Instructions

The skill includes Python scripts in scripts/ to perform operations.

Prerequisites

• Preferred: use uv run (PEP 723 header auto-installs deps) • Or install manually: pip install huggingface-hub markdown-it-py python-dotenv pyyaml requests • Set HF_TOKEN environment variable with Write-access token • For Artificial Analysis: Set AA_API_KEY environment variable • .env is loaded automatically if python-dotenv is installed

Discussion

0/2000

Loading comments...

Health Signals

MaintenanceCommitted 4mo ago

◐ Stale

Adoption1K+ stars on GitHub

8.5k ★ · Popular

DocsREADME + description

Well-documented

GitHub Signals

Stars8.5k

Forks502

Issues21

Updated4mo ago

View on GitHub

Apache-2.0 License

My Fox Den

Community Rating

Works With

Claude Code

Related Skills

Openclaw MCP Server

MCP Server

using-superpowers

Skill

dispatching-parallel-agents

Skill

subagent-driven-development

Skill

View all Skills →