Web scraping API
for LLM applications
URL in, LLM-ready content out. Markdown by default, structured data on demand.
$ curl https://api.scrape2llm.com/jobs \
-H "X-API-Key: s2l_..." \
-d '{"url": "https://example.com"}'
{
"job_id": "abc-123",
"status": "queued"
}What you get
Markdown by default
Clean, LLM-ready content out of the box. No HTML noise, no JavaScript, just text and structure.
Rich parse mode
Opt into ParsedDocument: raw HTML, link graph, and structured blocks for advanced pipelines.
Sync or webhook
Poll for status or pass a callback URL - works for both interactive queries and batch ingestion.
Selector targeting
Pinpoint a specific section with CSS selectors when full-page content is too much.
R2-backed HTML
Raw HTML offloaded to Cloudflare R2 - keep job records small, fetch raw bytes on demand.
Multiple keys per account
Rotate keys safely, label them per environment, revoke individually from the dashboard.
See it in action
Same API, three flavors. Pick your stack.
# Submit a job
curl -X POST https://api.scrape2llm.com/jobs \
-H "X-API-Key: s2l_..." \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "parse": true}'
# Response
{"job_id":"abc-123","status":"queued"}
# Poll until ready
curl https://api.scrape2llm.com/jobs/abc-123 \
-H "X-API-Key: s2l_..."scrape2llm is part of the 2LLM Suite — focused APIs that turn messy inputs into LLM-ready data. Also try files2llm and html2media and html2reel and stream2llm for the rest.