Skip to content
Skill

huggingface-local-models

by huggingface

AI Summary

Search the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with or . 1. Search the Hub with . 3. Prefer the exact HF local-app snippet and quant recommendation when it is visible.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "huggingface-local-models" skill in my project.

Please run this command in my terminal:
# Install skill into your project (4 files)
mkdir -p .claude/skills/huggingface-local-models && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/SKILL.md" && mkdir -p .claude/skills/huggingface-local-models/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/references/hardware.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/references/hardware.md" && mkdir -p .claude/skills/huggingface-local-models/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/references/hub-discovery.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/references/hub-discovery.md" && mkdir -p .claude/skills/huggingface-local-models/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/references/quantization.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/references/quantization.md"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

Description

Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.

Hugging Face Local Models

Search the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with llama-cli or llama-server.

Default Workflow

• Search the Hub with apps=llama.cpp. • Open https://huggingface.co/<repo>?local-app=llama.cpp. • Prefer the exact HF local-app snippet and quant recommendation when it is visible. • Confirm exact .gguf filenames with https://huggingface.co/api/models/<repo>/tree/main?recursive=true. • Launch with llama-cli -hf <repo>:<QUANT> or llama-server -hf <repo>:<QUANT>. • Fall back to --hf-repo plus --hf-file when the repo uses custom file naming. • Convert from Transformers weights only if the repo does not already expose GGUF files.

Install llama.cpp

`bash brew install llama.cpp winget install llama.cpp ` `bash git clone https://github.com/ggml-org/llama.cpp cd llama.cpp make `

Authenticate for gated repos

`bash hf auth login `

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted Yesterday
Active
Adoption1K+ stars on GitHub
10.7k ★ · Popular
DocsREADME + description
Well-documented

GitHub Signals

Stars10.7k
Forks703
Issues28
UpdatedYesterday
View on GitHub
Apache-2.0 License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Claude Code