Firecrawl

developer toolsai

Open-source web scraping and crawling API that converts any website into clean, LLM-ready markdown or structured JSON. Handles JavaScript rendering, full-site crawls, and browser interaction automatically

#scraping#ai#web#api#llm#crawling#self-hosted

Quick Start

git clone https://github.com/firecrawl/firecrawl && cd firecrawl && cp apps/api/.env.example apps/api/.env && docker compose up

Overview

Firecrawl is an API that converts websites into clean, structured data suitable for AI systems. Pass it a URL and it returns markdown, HTML, a screenshot, or structured JSON — with JavaScript rendered, dynamic content loaded, and the noise stripped. For AI agents, RAG pipelines, and any application that needs to read the live web, it handles the infrastructure work that makes web scraping genuinely reliable.

The output format is the core value. Rather than returning raw HTML that an LLM has to parse through, Firecrawl produces clean markdown without navigation, footers, cookie banners, or ad boilerplate. The content that matters — headings, body text, code blocks, tables — is preserved in a format that fits efficiently into a context window.

Four endpoints cover most use cases. /scrape turns a single URL into clean content. /crawl follows links from a starting point and scrapes pages across an entire site or section. /search queries the web and returns full-page markdown for each result in one call, eliminating the separate search-then-scrape step. /interact handles pages that require browser actions — clicking, typing, scrolling, navigating multi-step flows — to reach content that a static scrape cannot see.

Structured extraction works by passing a JSON schema to the scrape endpoint. Instead of getting markdown and asking an LLM to extract the relevant data, you define the shape you want and get typed data back directly. Product listings, pricing tables, contact details, and any predictable page structure are candidates for this approach.

The MCP server integrates Firecrawl directly into Claude, Cursor, and other MCP-compatible clients, giving AI agents native web access without custom tooling. SDKs cover Python, Node.js, Go, Rust, Java, and Elixir.

Self-hosting deploys the full stack via Docker Compose. The self-hosted version handles most sites well but uses Playwright for browser rendering rather than Firecrawl’s hosted Fire-engine infrastructure, which means some heavily protected sites that the cloud version handles may not work as reliably on a self-hosted instance.

Hosted pricing starts free at 1,000 credits per month (1 credit per scraped page) with paid plans from a Hobby tier up to a Scale plan at $599/month for 1M credits. Credits do not roll over between months.

Firecrawl: Pros & Cons

Pros (The Wins)Cons (The Friction)
LLM-ready output:
Clean markdown without nav,
ads, or HTML noise.
AGPL licence:
Commercial embedding needs
a licence review.
JS rendering built in:
SPAs and dynamic pages work
with no extra configuration.
Self-hosted rendering limits:
Playwright vs Fire-engine; some
anti-bot sites less reliable.
Search + scrape in one:
Full-page markdown returned
alongside search results.
Credit-based pricing:
No pay-per-use; high volume
costs accumulate on monthly plans.
125.6k stars:
Used by Apple, Canva, Shopify;
1M+ users, 80,000+ companies.
Multi-container self-hosting:
API, Playwright, Redis, and
Flower all required.

Use Cases

Specific ways to use Firecrawl for your workflow.

01
Give an AI agent access to live web content by pointing it at Firecrawl's search and scrape endpoints
02
Build a RAG pipeline that ingests documentation sites or knowledge bases as clean markdown without writing a scraper
03
Extract structured data from product listings, pricing pages, or directories using a JSON schema instead of parsing HTML
04
Crawl an entire website and convert every page to markdown for indexing, analysis, or LLM context

Deployment Strategy

Recommended ways to host Firecrawl in your own environment.

docker
self-hosted