Skill

hugging-face-evaluation

by huggingface

AI Summary

This skill automates the process of adding, extracting, and managing evaluation results in Hugging Face model cards, supporting multiple data sources including Artificial Analysis API and custom evaluations with vLLM/lighteval. It's valuable for ML practitioners and model maintainers who need to track and display model performance metrics.

Install

# Add to your project root as SKILL.md
curl -o SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-evaluation/SKILL.md"

Description

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

Overview

This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data: • Extracting existing evaluation tables from README content • Importing benchmark scores from Artificial Analysis • Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai)

Features

• vLLM Backend: High-performance GPU inference (5-10x faster than standard HF methods) • lighteval Framework: HuggingFace's evaluation library with Open LLM Leaderboard tasks • inspect-ai Framework: UK AI Safety Institute's evaluation library • Standalone or Jobs: Run locally or submit to HF Jobs infrastructure

Usage Instructions

The skill includes Python scripts in scripts/ to perform operations.

Prerequisites

• Preferred: use uv run (PEP 723 header auto-installs deps) • Or install manually: pip install huggingface-hub markdown-it-py python-dotenv pyyaml requests • Set HF_TOKEN environment variable with Write-access token • For Artificial Analysis: Set AA_API_KEY environment variable • .env is loaded automatically if python-dotenv is installed

Quality Score

B

Good

75/100

Standard Compliance45
Documentation Quality72
Usefulness78
Maintenance Signal100
Community Signal100
Scored Today

GitHub Signals

Stars7.5k
Forks438
Issues19
UpdatedYesterday
View on GitHub

Trust & Transparency

Open Source — Apache-2.0

Source code publicly auditable

Verified Open Source

Hosted on GitHub — publicly auditable

Actively Maintained

Last commit Yesterday

7.5k stars — Strong Community

438 forks

My Fox Den

Community Rating

Works With

Claude Code