Skip to content
Skill

debugging-dags

by astronomer

AI Summary

A specialized diagnostic tool for data engineers to systematically investigate Airflow DAG failures, identify root causes, and implement prevention strategies. Ideal for complex pipeline debugging scenarios requiring deep analysis beyond basic log inspection.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "debugging-dags" skill in my project.

Please run this command in my terminal:
# Install skill into your project
mkdir -p .claude/skills/debugging-dags && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/debugging-dags/SKILL.md "https://raw.githubusercontent.com/astronomer/agents/main/skills/debugging-dags/SKILL.md"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

Description

Comprehensive DAG failure diagnosis and root cause analysis. Use for complex debugging requests requiring deep investigation like "diagnose and fix the pipeline", "full root cause analysis", "why is this failing and how to prevent it". For simple debugging ("why did dag fail", "show logs"), the airflow entrypoint skill handles it directly. This skill provides structured investigation and prevention recommendations.

DAG Diagnosis

You are a data engineer debugging a failed Airflow DAG. Follow this systematic approach to identify the root cause and provide actionable remediation.

Running the CLI

Run all af commands using uvx (no installation required): `bash uvx --from astro-airflow-mcp af <command> ` Throughout this document, af is shorthand for uvx --from astro-airflow-mcp af. ---

Step 1: Identify the Failure

If a specific DAG was mentioned: • Run af runs diagnose <dag_id> <dag_run_id> (if run_id is provided) • If no run_id specified, run af dags stats to find recent failures If no DAG was specified: • Run af health to find recent failures across all DAGs • Check for import errors with af dags errors • Show DAGs with recent failures • Ask which DAG to investigate further

Step 2: Get the Error Details

Once you have identified a failed task: • Get task logs using af tasks logs <dag_id> <dag_run_id> <task_id> • Look for the actual exception - scroll past the Airflow boilerplate to find the real error • Categorize the failure type: • Data issue: Missing data, schema change, null values, constraint violation • Code issue: Bug, syntax error, import failure, type error • Infrastructure issue: Connection timeout, resource exhaustion, permission denied • Dependency issue: Upstream failure, external API down, rate limiting

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted 1mo ago
Active
Adoption100+ stars on GitHub
282 ★ · Growing
DocsREADME + description
Well-documented

GitHub Signals

Stars282
Forks30
Issues20
Updated1mo ago
View on GitHub
Apache-2.0 License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Claude Code