AI SummaryA specialized diagnostic tool for data engineers to systematically investigate Airflow DAG failures, identify root causes, and implement prevention strategies. Ideal for complex pipeline debugging scenarios requiring deep analysis beyond basic log inspection.
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "debugging-dags" skill in my project. Please run this command in my terminal: # Install skill into your project mkdir -p .claude/skills/debugging-dags && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/debugging-dags/SKILL.md "https://raw.githubusercontent.com/astronomer/agents/main/skills/debugging-dags/SKILL.md" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
Comprehensive DAG failure diagnosis and root cause analysis. Use for complex debugging requests requiring deep investigation like "diagnose and fix the pipeline", "full root cause analysis", "why is this failing and how to prevent it". For simple debugging ("why did dag fail", "show logs"), the airflow entrypoint skill handles it directly. This skill provides structured investigation and prevention recommendations.
DAG Diagnosis
You are a data engineer debugging a failed Airflow DAG. Follow this systematic approach to identify the root cause and provide actionable remediation.
Running the CLI
Run all af commands using uvx (no installation required): `bash uvx --from astro-airflow-mcp af <command> ` Throughout this document, af is shorthand for uvx --from astro-airflow-mcp af. ---
Step 1: Identify the Failure
If a specific DAG was mentioned: • Run af runs diagnose <dag_id> <dag_run_id> (if run_id is provided) • If no run_id specified, run af dags stats to find recent failures If no DAG was specified: • Run af health to find recent failures across all DAGs • Check for import errors with af dags errors • Show DAGs with recent failures • Ask which DAG to investigate further
Step 2: Get the Error Details
Once you have identified a failed task: • Get task logs using af tasks logs <dag_id> <dag_run_id> <task_id> • Look for the actual exception - scroll past the Airflow boilerplate to find the real error • Categorize the failure type: • Data issue: Missing data, schema change, null values, constraint violation • Code issue: Bug, syntax error, import failure, type error • Infrastructure issue: Connection timeout, resource exhaustion, permission denied • Dependency issue: Upstream failure, external API down, rate limiting
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster