AI Summary1. Spec compliance first: Follow WHATWG HTML5 spec exactly. No heuristics, no shortcuts. 2. No exceptions in hot paths: Use deterministic control flow, not try/except for branching. 3. No reflective probing: No , , or - all data structures used are deterministic.
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to add the "justhtml — Copilot Instructions" prompt rules to my project. Repository: https://github.com/EmilStenstrom/justhtml Please read the repo to find the rules/prompt file, then: 1. Download it to the correct location (.cursorrules, .windsurfrules, .github/prompts/, or project root — based on the file type) 2. If there's an existing rules file, merge the new rules in rather than overwriting 3. Confirm what was added
Description
A pure Python HTML5 parser that just works. No C extensions to compile. No system dependencies to install. No complex API to learn.
Decision & Clarification Policy (Overrides)
• Replace "propose a follow-up" with "propose and execute the best alternative by default; ask only for destructive/irreversible choices." • Keep preambles to a single declarative sentence ("I'm scanning the repo and then drafting a minimal fix.") — no approval requests.
Architecture Snapshot
• Tokenizer (tokenizer.py): HTML5 spec state machine (~60 states). Handles RCDATA, RAWTEXT, CDATA, script escaping, comments, DOCTYPE, etc. • Tree builder (treebuilder.py): Token sink that constructs DOM tree following HTML5 construction rules. • Node tree (node.py): DOM-like structure. Always use append_child() / insert_before() for tree operations. • Entities (entities.py): HTML5 character reference decoding (named & numeric entities). • Constants (constants.py): HTML5 element categories, void elements, formatting elements, etc.
Golden Rules
• Spec compliance first: Follow WHATWG HTML5 spec exactly. No heuristics, no shortcuts. • No exceptions in hot paths: Use deterministic control flow, not try/except for branching. • No reflective probing: No hasattr, getattr, or delattr - all data structures used are deterministic. • Minimal allocations: Reuse buffers, avoid per-token object creation in tokenizer. • Token reuse: Create new token objects when emitting (don't reuse references). • State machine purity: Tokenizer state transitions follow spec state machine exactly. • No test-specific code: No references to test files in comments or code.
Testing Workflow
• Target failures: Use --test-specs file:indices to run specific tests `bash python run_tests.py --test-specs test2.test:5,10 -v ` • Check test output: Use -v for diffs, -vv for debug output `bash python run_tests.py --test-specs test3.test -vv ` • Run full suite: Always check for regressions `bash python run_tests.py -q # Quick overview python run_tests.py --regressions # Check for new failures vs baseline ` • Quick iteration: Test snippet without full suite (full suite runs in ~1s) `bash python -c 'from justhtml import JustHTML, to_test_format; print(to_test_format(JustHTML("<html>").root))' ` • Benchmark performance: After changes, verify speed impact `bash python benchmarks/performance.py --iterations 1 --parser justhtml --no-mem ` • Profile hotspots: For performance optimization `bash python benchmarks/profile.py # Profiles on web100k dataset `
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster
Works With
Any AI assistant that accepts custom rules or system prompts