Skip to content
Skill

hugging-face-datasets

by huggingface

AI Summary

This skill enables AI assistants to create, configure, and manage datasets on Hugging Face Hub with SQL-based querying and transformation capabilities. It's valuable for developers building data workflows and ML projects that require programmatic dataset management.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "hugging-face-datasets" skill in my project.

Please run this command in my terminal:
# Install skill into the correct directory (12 files)
mkdir -p .claude/skills/hugging-face-datasets && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/SKILL.md" && mkdir -p .claude/skills/hugging-face-datasets/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/examples/diverse_training_examples.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/examples/diverse_training_examples.json" && mkdir -p .claude/skills/hugging-face-datasets/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/examples/system_prompt_template.txt "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/examples/system_prompt_template.txt" && mkdir -p .claude/skills/hugging-face-datasets/examples && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/examples/training_examples.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/examples/training_examples.json" && mkdir -p .claude/skills/hugging-face-datasets/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/scripts/dataset_manager.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/scripts/dataset_manager.py" && mkdir -p .claude/skills/hugging-face-datasets/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/scripts/sql_manager.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/scripts/sql_manager.py" && mkdir -p .claude/skills/hugging-face-datasets/templates && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/templates/chat.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/templates/chat.json" && mkdir -p .claude/skills/hugging-face-datasets/templates && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/templates/classification.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/templates/classification.json" && mkdir -p .claude/skills/hugging-face-datasets/templates && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/templates/completion.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/templates/completion.json" && mkdir -p .claude/skills/hugging-face-datasets/templates && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/templates/custom.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/templates/custom.json" && mkdir -p .claude/skills/hugging-face-datasets/templates && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/templates/qa.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/templates/qa.json" && mkdir -p .claude/skills/hugging-face-datasets/templates && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/hugging-face-datasets/templates/tabular.json "https://raw.githubusercontent.com/huggingface/skills/main/skills/hugging-face-datasets/templates/tabular.json"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

Description

Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.

Overview

This skill provides tools to manage datasets on the Hugging Face Hub with a focus on creation, configuration, content management, and SQL-based data manipulation. It is designed to complement the existing Hugging Face MCP server by providing dataset editing and querying capabilities.

Scripts auto-install requirements when run with: uv run scripts/script_name.py

• uv (Python package manager) • Getting Started: See "Usage Instructions" below for PEP 723 usage

4. Quality Assurance Features

• JSON Validation: Ensures data integrity during uploads • Batch Processing: Efficient handling of large datasets • Error Recovery: Graceful handling of upload failures and conflicts

Usage Instructions

The skill includes two Python scripts that use PEP 723 inline dependency management: > **All paths are relative to the directory containing this SKILL.md file.** > Scripts are run with: uv run scripts/script_name.py [arguments] • scripts/dataset_manager.py - Dataset creation and management • scripts/sql_manager.py - SQL-based dataset querying and transformation

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted 1mo ago
Active
Adoption1K+ stars on GitHub
8.5k ★ · Popular
DocsREADME + description
Well-documented

GitHub Signals

Stars8.5k
Forks502
Issues21
Updated1mo ago
View on GitHub
Apache-2.0 License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Claude Code