Skip to content
Skill

huggingface-vision-trainer

by huggingface

AI Summary

Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub. Use this skill when users want to: Helper scripts use PEP 723 inline dependencies. Run them with :

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "huggingface-vision-trainer" skill in my project.

Please run this command in my terminal:
# Install skill into your project (12 files)
mkdir -p .claude/skills/huggingface-vision-trainer && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/SKILL.md" && mkdir -p .claude/skills/huggingface-vision-trainer/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/references/finetune_sam2_trainer.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/references/finetune_sam2_trainer.md" && mkdir -p .claude/skills/huggingface-vision-trainer/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/references/hub_saving.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/references/hub_saving.md" && mkdir -p .claude/skills/huggingface-vision-trainer/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/references/image_classification_training_notebook.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/references/image_classification_training_notebook.md" && mkdir -p .claude/skills/huggingface-vision-trainer/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/references/object_detection_training_notebook.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/references/object_detection_training_notebook.md" && mkdir -p .claude/skills/huggingface-vision-trainer/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/references/reliability_principles.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/references/reliability_principles.md" && mkdir -p .claude/skills/huggingface-vision-trainer/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/references/timm_trainer.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/references/timm_trainer.md" && mkdir -p .claude/skills/huggingface-vision-trainer/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/scripts/dataset_inspector.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/scripts/dataset_inspector.py" && mkdir -p .claude/skills/huggingface-vision-trainer/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/scripts/estimate_cost.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/scripts/estimate_cost.py" && mkdir -p .claude/skills/huggingface-vision-trainer/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/scripts/image_classification_training.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/scripts/image_classification_training.py" && mkdir -p .claude/skills/huggingface-vision-trainer/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/scripts/object_detection_training.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/scripts/object_detection_training.py" && mkdir -p .claude/skills/huggingface-vision-trainer/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-vision-trainer/scripts/sam_segmentation_training.py "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-vision-trainer/scripts/sam_segmentation_training.py"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

Description

Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.

Prerequisites Checklist

Before starting any training job, verify:

Dataset Requirements — Object Detection

• Dataset must exist on Hub • Annotations must use the objects column with bbox, category (and optionally area) sub-fields • Bboxes can be in xywh (COCO) or xyxy (Pascal VOC) format — auto-detected and converted • Categories can be integers or strings — strings are auto-remapped to integer IDs • image_id column is optional — generated automatically if missing • ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

Dataset Requirements — Image Classification

• Dataset must exist on Hub • Must have an image column (PIL images) and a label column (integer class IDs or strings) • The label column can be ClassLabel type (with names) or plain integers/strings — strings are auto-remapped • Common column names auto-detected: label, labels, class, fine_label • ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

Dataset Requirements — SAM/SAM2 Segmentation

• Dataset must exist on Hub • Must have an image column (PIL images) and a mask column (binary ground-truth segmentation mask) • Must have a prompt — either: • A prompt column with JSON containing {"bbox": [x0,y0,x1,y1]} or {"point": [x,y]} • OR a dedicated bbox column with [x0,y0,x1,y1] values • OR a dedicated point column with [x,y] or [[x,y],...] values • Bboxes should be in xyxy format (absolute pixel coordinates) • Example dataset: merve/MicroMat-mini (image matting with bbox prompts) • ALWAYS validate unknown datasets before GPU training (see Dataset Validation section)

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted Today
Active
Adoption1K+ stars on GitHub
10.0k ★ · Popular
DocsREADME + description
Well-documented

GitHub Signals

Stars10.0k
Forks610
Issues26
UpdatedToday
View on GitHub
Apache-2.0 License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Claude Code