AI SummaryThe html-to-markdown library provides comprehensive, single-pass metadata extraction during HTML-to-Markdown conversion. This enables content analysis, SEO optimization, document indexing, and structured data processing without extra parsing passes. Metadata extraction uses the same pattern as inli
Install
Copy this and paste it into Claude Code, Cursor, or any AI assistant:
I want to install the "metadata-extraction" skill in my project. Please run this command in my terminal: # Install skill into your project mkdir -p .claude/skills/metadata-extraction && curl --retry 3 --retry-delay 2 --retry-all-errors -o .claude/skills/metadata-extraction/SKILL.md "https://raw.githubusercontent.com/kreuzberg-dev/html-to-markdown/main/.codex/skills/metadata-extraction/SKILL.md" Then restart Claude Code (or reload the window in Cursor) so the skill is picked up.
Description
Metadata Extraction for html-to-markdown
Overview
The html-to-markdown library provides comprehensive, single-pass metadata extraction during HTML-to-Markdown conversion. This enables content analysis, SEO optimization, document indexing, and structured data processing without extra parsing passes.
Use Cases
• Table of contents generation: Build TOC from headers and IDs • Document outline: Create hierarchical structure • SEO analysis: Verify H1 presence, hierarchy correctness • Navigation: Generate internal anchor links
Single-Pass Collection
Metadata extraction uses the same MetadataCollector pattern as inline image collection: `rust // From lib.rs line 445 let metadata_collector = Rc::new(RefCell::new(metadata::MetadataCollector::new(metadata_cfg))); // Passed to converter during tree walk let markdown = converter::convert_html_with_metadata( normalized_html.as_ref(), &options, Rc::clone(&metadata_collector) )?; // After conversion, recover and return metadata let metadata = metadata_collector.finish(); ` Key Benefits: • Zero overhead when disabled: Entire module compilable out via feature flags • Single tree traversal: No separate metadata extraction pass • Memory efficient: Pre-allocated buffers (typical: 32 headers, 64 links, 16 images) • Configurable granularity: Extract only needed metadata types
MetadataConfig Structure
Located in /crates/html-to-markdown/src/metadata.rs: `rust pub struct MetadataConfig { pub extract_document: bool, // <head> meta tags, title, etc. pub extract_headers: bool, // h1-h6 with hierarchy pub extract_links: bool, // All hyperlinks with classification pub extract_images: bool, // All images with dimensions pub extract_structured_data: bool, // JSON-LD, Microdata, RDFa pub max_structured_data_size: usize, // Prevent memory exhaustion } `
Discussion
Health Signals
My Fox Den
Community Rating
Sign in to rate this booster