AI SummaryA Cursor-integrated coding standards and linting configuration toolkit for NeMo Curator projects, helping teams enforce consistent data processing code quality with Ruff-based rules and exceptions.
Install
# Install Cursor rule to .cursor/rules/ mkdir -p .cursor/rules && curl --retry 3 --retry-delay 2 --retry-all-errors -o .cursor/rules/coding-standards.mdc "https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/.cursor/rules/coding-standards.mdc"
Run in your IDE terminal (bash). On Windows, use Git Bash, WSL, or your IDE's built-in terminal. If curl fails with an SSL error, your network may block raw.githubusercontent.com — try using a VPN or download the files directly from the source repo.
Description
Scalable data pre processing and curation toolkit for LLMs
`examples/` directory
• No __init__.py required (INP001)
Linting and Formatting
The project uses Ruff for linting and formatting with line length of 119 characters.
Allowed Patterns
• ✅ Print statements (T20 ignored) • ✅ Boolean arguments in functions (FBT ignored) • ✅ df as variable name for DataFrames (PD901 ignored) • ✅ TODOs without author/link (TD002, TD003 ignored) • ✅ Long exception messages (TRY003 ignored) • ✅ Accessing private attributes (SLF001 ignored) • ✅ Branching after return (RET505-508 ignored)
Required Patterns
• ❌ No docstrings required (D ignored) • ❌ No pathlib enforcement (PTH ignored) • ❌ No logging enforcement (G ignored) • ✅ Type annotations for functions (except args, *kwargs, special methods)
Quality Score
Good
76/100
Trust & Transparency
Open Source — Apache-2.0
Source code publicly auditable
Verified Open Source
Hosted on GitHub — publicly auditable
Actively Maintained
Last commit Yesterday
1.4k stars — Strong Community
232 forks
My Fox Den
Community Rating
Sign in to rate this booster