AI SummaryRules and patterns for ML demos on Hugging Face Spaces with ZeroGPU hardware. Covers , duration and quota tuning, process isolation, the CUDA availability model, concurrency safety, and CUDA build constraints. This skill is for Gradio SDK Spaces using ZeroGPU hardware. Docker and Static Spaces canno
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "huggingface-zerogpu" skill in my project. Please run this command in my terminal: # Install skill into your project (5 files) mkdir -p .claude/skills/huggingface-zerogpu && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-zerogpu/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-zerogpu/SKILL.md" && mkdir -p .claude/skills/huggingface-zerogpu/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-zerogpu/references/concurrency.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-zerogpu/references/concurrency.md" && mkdir -p .claude/skills/huggingface-zerogpu/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-zerogpu/references/cuda-and-deps.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-zerogpu/references/cuda-and-deps.md" && mkdir -p .claude/skills/huggingface-zerogpu/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-zerogpu/references/how-quota-works.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-zerogpu/references/how-quota-works.md" && mkdir -p .claude/skills/huggingface-zerogpu/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-zerogpu/references/how-zerogpu-works.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-zerogpu/references/how-zerogpu-works.md" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
AI demos and GPU compute with Gradio Spaces and Hugging Face Spaces ZeroGPU. Use when writing or reviewing code that uses `@spaces.GPU`, configuring `python_version` or `requirements.txt` for a ZeroGPU Space, or handling ZeroGPU-specific code constraints — pickle-based process isolation, `gr.State` semantics across the worker boundary, no `torch.compile` (use AoTI instead), CUDA wheel-only builds (no `nvcc` at build or runtime), large vs xlarge sizing, and dynamic duration callables. Make sure to use this skill whenever the user mentions ZeroGPU, `@spaces.GPU`, or the `spaces` Python package, or hits ZeroGPU-specific code errors like `PicklingError` across the worker boundary, `illegal duration`, or `flash-attn` wheel-build failures — even when the user does not explicitly ask for ZeroGPU coding guidance. Trigger on `import spaces` or `@spaces.GPU` in code.
Hugging Face ZeroGPU
Rules and patterns for ML demos on Hugging Face Spaces with ZeroGPU hardware. Covers @spaces.GPU, duration and quota tuning, process isolation, the CUDA availability model, concurrency safety, and CUDA build constraints.
Scope
This skill is for Gradio SDK Spaces using ZeroGPU hardware. Docker and Static Spaces cannot schedule onto ZeroGPU, and Streamlit apps now run as Docker Spaces — so this skill applies only to Gradio. For general Gradio coding (components, layouts, event listeners), see the huggingface-gradio skill in this repo. The authoritative ZeroGPU docs live at https://huggingface.co/docs/hub/spaces-zerogpu — refer to them for the current backing GPU, runtime version lists, and tier thresholds, all of which change over time.
Reference Files
| Reference | When to read | |-----------|--------------| | references/concurrency.md | Always read alongside SKILL.md when writing ZeroGPU code — handlers run in parallel by default | | references/how-zerogpu-works.md | When reasoning about cold-starts, worker reuse, why module-scope warmup does not carry to requests, or why returning CUDA tensors hangs | | references/how-quota-works.md | When choosing duration values, debugging illegal duration vs quota exceeded errors, or explaining why default 60s blocks short tasks | | references/cuda-and-deps.md | When installing CUDA-dependent packages (e.g. flash-attn), pinning torch side-cars, or reading wheel filename tags |
Hardware
ZeroGPU exposes two GPU sizes that map to a fraction of the backing card: | size | Slice of backing GPU | Quota cost | |--------|----------------------|------------| | large (default) | Half | 1x | | xlarge | Full | 2x | Default large gives half a physical GPU, so memory bandwidth and compute are significantly lower than the full card's specs. Use xlarge only when the workload genuinely needs the extra memory or compute. > Backing GPU changes without notice. ZeroGPU has already migrated across GPU generations several times; older write-ups may name A100 or H200, but those are outdated. For the current backing GPU and exact per-size VRAM, always check the ZeroGPU docs before sizing workloads.
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster