AI SummaryExtracts structured datasets from academic papers by leveraging OpenAlex search and citation graph traversal. Researchers and AI scientists benefit from automating literature review and dataset creation workflows.
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "scientific-papers-to-dataset" skill in my project. Please run this command in my terminal: # Install skill into the correct directory (6 files) mkdir -p .claude/skills/skill && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/SKILL.md "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/SKILL.md" && mkdir -p .claude/skills/skill/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/references/OPENALEX.md "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/references/OPENALEX.md" && mkdir -p .claude/skills/skill/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/references/WORKFLOW.md "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/references/WORKFLOW.md" && mkdir -p .claude/skills/skill/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/scripts/bfs_queue.py "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/scripts/bfs_queue.py" && mkdir -p .claude/skills/skill/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/scripts/download_pdf.py "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/scripts/download_pdf.py" && mkdir -p .claude/skills/skill/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/scripts/search_openalex.py "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/scripts/search_openalex.py" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
Build structured datasets from academic papers. Use when the user wants to extract structured data from scientific literature, traverse citation graphs, search OpenAlex for papers, or create datasets from PDFs for research purposes.
scientific-papers-to-dataset
Build datasets by extracting structured data from academic papers and traversing citation graphs.
When to Use This Skill
Use this skill when the user wants to: • Create a dataset from academic papers • Extract structured information from PDFs • Search for papers on a topic using OpenAlex • Traverse citation graphs to find related papers
Architecture: Subagent Pattern
> [!IMPORTANT] > Use subagents for PDF download, relevance checking, data extraction, and citation traversal to keep the main context clean.
Recommended Subagents
• pdf-downloader - Downloads PDF for a paper ID • relevance-checker - Evaluates paper relevance from title/abstract • data-extractor - Reads PDF and extracts structured data (use thinking model) • citation-traverser - Fetches related/cited/citing papers from OpenAlex
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster