Crawl Orchestration With Queues
How CragData queues discover and crawl jobs, handles retries, and delivers graph updates without you running workers.
Crawl orchestration with queues
Production crawls are not for url in urls: fetch(url). They are queued systems with backpressure, retries, and delivery guarantees.
Enqueue
POST /v1/crawl
POST /v1/discover
You get job_id immediately. Heavy work happens asynchronously.
Workers
- Concurrency caps per account
- robots.txt respect
- Anti-bot backoff on 429/503
scrapableflags in graph responses
Observe
GET /crawl/{job_id}for progressGET /jobsfor history- Webhooks:
crawl.completed,page.extracted
When queue is full
HTTP 409 — finish or cancel the current job before starting another (unless your plan allows parallel jobs).
See Queues & retries in the docs.