AI SummarySearch the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with or . 1. Search the Hub with . 3. Prefer the exact HF local-app snippet and quant recommendation when it is visible.
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "huggingface-local-models" skill in my project. Please run this command in my terminal: # Install skill into your project (4 files) mkdir -p .claude/skills/huggingface-local-models && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/SKILL.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/SKILL.md" && mkdir -p .claude/skills/huggingface-local-models/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/references/hardware.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/references/hardware.md" && mkdir -p .claude/skills/huggingface-local-models/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/references/hub-discovery.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/references/hub-discovery.md" && mkdir -p .claude/skills/huggingface-local-models/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/huggingface-local-models/references/quantization.md "https://raw.githubusercontent.com/huggingface/skills/main/skills/huggingface-local-models/references/quantization.md" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.
Hugging Face Local Models
Search the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with llama-cli or llama-server.
Default Workflow
• Search the Hub with apps=llama.cpp. • Open https://huggingface.co/<repo>?local-app=llama.cpp. • Prefer the exact HF local-app snippet and quant recommendation when it is visible. • Confirm exact .gguf filenames with https://huggingface.co/api/models/<repo>/tree/main?recursive=true. • Launch with llama-cli -hf <repo>:<QUANT> or llama-server -hf <repo>:<QUANT>. • Fall back to --hf-repo plus --hf-file when the repo uses custom file naming. • Convert from Transformers weights only if the repo does not already expose GGUF files.
Install llama.cpp
`bash brew install llama.cpp winget install llama.cpp ` `bash git clone https://github.com/ggml-org/llama.cpp cd llama.cpp make `
Authenticate for gated repos
`bash hf auth login `
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster