Turn web pages into clean structured documents
Turn web pages into clean structured documents Crawl, extract, and deliver structured web intelligence via API.
Pain points
- PDF/HTML hybrids break parsers
- Footers and nav pollute LLM context
- Need blocks, not raw DOM
Architecture
- Fetch URL
- Strip boilerplate
- Return content[] blocks and metadata
- Optional export to Parquet
Example output
{ "title": "Annual report", "content": ["section 1..."], "links": [] }FAQ
How fast can I start?
Sign up free, create an API key, and call /graph/domain-context or /scrape in minutes. See /docs for curl examples.
Is output AI-ready?
Yes — structured JSON, context_for_ai summaries, and link graphs designed for agents and RAG pipelines.