Skip to content
Prompt

justhtml — Copilot Instructions

by EmilStenstrom

AI Summary

1. Spec compliance first: Follow WHATWG HTML5 spec exactly. No heuristics, no shortcuts. 2. No exceptions in hot paths: Use deterministic control flow, not try/except for branching. 3. No reflective probing: No , , or - all data structures used are deterministic.

Install

Copy this and paste it into Claude Code, Cursor, or any AI assistant:

I want to add the "justhtml — Copilot Instructions" prompt rules to my project.
Repository: https://github.com/EmilStenstrom/justhtml

Please read the repo to find the rules/prompt file, then:
1. Download it to the correct location (.cursorrules, .windsurfrules, .github/prompts/, or project root — based on the file type)
2. If there's an existing rules file, merge the new rules in rather than overwriting
3. Confirm what was added

Description

A pure Python HTML5 parser that just works. No C extensions to compile. No system dependencies to install. No complex API to learn.

Decision & Clarification Policy (Overrides)

• Replace "propose a follow-up" with "propose and execute the best alternative by default; ask only for destructive/irreversible choices." • Keep preambles to a single declarative sentence ("I'm scanning the repo and then drafting a minimal fix.") — no approval requests.

Architecture Snapshot

• Tokenizer (tokenizer.py): HTML5 spec state machine (~60 states). Handles RCDATA, RAWTEXT, CDATA, script escaping, comments, DOCTYPE, etc. • Tree builder (treebuilder.py): Token sink that constructs DOM tree following HTML5 construction rules. • Node tree (node.py): DOM-like structure. Always use append_child() / insert_before() for tree operations. • Entities (entities.py): HTML5 character reference decoding (named & numeric entities). • Constants (constants.py): HTML5 element categories, void elements, formatting elements, etc.

Golden Rules

• Spec compliance first: Follow WHATWG HTML5 spec exactly. No heuristics, no shortcuts. • No exceptions in hot paths: Use deterministic control flow, not try/except for branching. • No reflective probing: No hasattr, getattr, or delattr - all data structures used are deterministic. • Minimal allocations: Reuse buffers, avoid per-token object creation in tokenizer. • Token reuse: Create new token objects when emitting (don't reuse references). • State machine purity: Tokenizer state transitions follow spec state machine exactly. • No test-specific code: No references to test files in comments or code.

Testing Workflow

• Target failures: Use --test-specs file:indices to run specific tests `bash python run_tests.py --test-specs test2.test:5,10 -v ` • Check test output: Use -v for diffs, -vv for debug output `bash python run_tests.py --test-specs test3.test -vv ` • Run full suite: Always check for regressions `bash python run_tests.py -q # Quick overview python run_tests.py --regressions # Check for new failures vs baseline ` • Quick iteration: Test snippet without full suite (full suite runs in ~1s) `bash python -c 'from justhtml import JustHTML, to_test_format; print(to_test_format(JustHTML("<html>").root))' ` • Benchmark performance: After changes, verify speed impact `bash python benchmarks/performance.py --iterations 1 --parser justhtml --no-mem ` • Profile hotspots: For performance optimization `bash python benchmarks/profile.py # Profiles on web100k dataset `

Discussion

0/2000
Loading comments...

Health Signals

MaintenanceCommitted 2d ago
Active
Adoption1K+ stars on GitHub
1.1k ★ · Popular
DocsREADME + description
Well-documented

GitHub Signals

Stars1.1k
Forks36
Issues1
Updated2d ago
View on GitHub
No License

My Fox Den

Community Rating

Sign in to rate this booster

Works With

Any AI assistant that accepts custom rules or system prompts

Claude
ChatGPT
Cursor
Windsurf
Copilot
+ more