— Use case

Turn web pages into clean structured documents

Turn web pages into clean structured documents Crawl, extract, and deliver structured web intelligence via API.

Pain points

  • PDF/HTML hybrids break parsers
  • Footers and nav pollute LLM context
  • Need blocks, not raw DOM

Architecture

  1. Fetch URL
  2. Strip boilerplate
  3. Return content[] blocks and metadata
  4. Optional export to Parquet

Example output

{ "title": "Annual report", "content": ["section 1..."], "links": [] }

FAQ

How fast can I start?

Sign up free, create an API key, and call /graph/domain-context or /scrape in minutes. See /docs for curl examples.

Is output AI-ready?

Yes — structured JSON, context_for_ai summaries, and link graphs designed for agents and RAG pipelines.