Skip to content
Skill

scientific-papers-to-dataset

by eamag

AI Summary

Extracts structured datasets from academic papers by leveraging OpenAlex search and citation graph traversal. Researchers and AI scientists benefit from automating literature review and dataset creation workflows.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to install the "scientific-papers-to-dataset" skill in my project.

Please run this command in my terminal:
# Install skill into the correct directory (6 files)
mkdir -p .claude/skills/skill && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/SKILL.md "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/SKILL.md" && mkdir -p .claude/skills/skill/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/references/OPENALEX.md "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/references/OPENALEX.md" && mkdir -p .claude/skills/skill/references && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/references/WORKFLOW.md "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/references/WORKFLOW.md" && mkdir -p .claude/skills/skill/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/scripts/bfs_queue.py "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/scripts/bfs_queue.py" && mkdir -p .claude/skills/skill/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/scripts/download_pdf.py "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/scripts/download_pdf.py" && mkdir -p .claude/skills/skill/scripts && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/skill/scripts/search_openalex.py "https://raw.githubusercontent.com/eamag/papers2dataset/main/skill/scripts/search_openalex.py"

Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.

Description

Build structured datasets from academic papers. Use when the user wants to extract structured data from scientific literature, traverse citation graphs, search OpenAlex for papers, or create datasets from PDFs for research purposes.

scientific-papers-to-dataset

Build datasets by extracting structured data from academic papers and traversing citation graphs.

When to Use This Skill

Use this skill when the user wants to: • Create a dataset from academic papers • Extract structured information from PDFs • Search for papers on a topic using OpenAlex • Traverse citation graphs to find related papers

Architecture: Subagent Pattern

> [!IMPORTANT] > Use subagents for PDF download, relevance checking, data extraction, and citation traversal to keep the main context clean.

Recommended Subagents

• pdf-downloader - Downloads PDF for a paper ID • relevance-checker - Evaluates paper relevance from title/abstract • data-extractor - Reads PDF and extracts structured data (use thinking model) • citation-traverser - Fetches related/cited/citing papers from OpenAlex

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted 2mo ago
Active
AdoptionUnder 100 stars
21 ★ · Niche
DocsREADME + description
Well-documented

GitHub Signals

Stars21
Forks2
Issues0
Updated2mo ago
View on GitHub
No License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Claude Code