FlareSolverr Guide: Scrape and Bypass Cloudflare
In this article, I’ll show you exactly how to set up and use FlareSolverr to scrape data without getting blocked. Let’s dive in!
What is FlareSolverr?
FlareSolverr is an open-source tool that helps bypass Cloudflare’s security challenges. Websites use Cloudflare to prevent bots from scraping data, but FlareSolverr can bypass these protections by acting as a real browser. It loads web pages like a human would, passing Cloudflare’s verification checks.
How FlareSolverr Works
FlareSolverr is a great solution for scraping data from websites that use Cloudflare protection.
- Acts as a Proxy: FlareSolverr runs as a proxy server that intercepts your requests and forwards them to the target website.
- Mimics a Real Browser: It launches a headless browser session to load the webpage and solve the Cloudflare challenge.
- Returns the Scraped Data: Once the challenge is solved, FlareSolverr sends back the page content so you can extract the data you need.
How to Install and Set Up FlareSolverr
The recommended way to install FlareSolverr is by using Docker. Docker helps package all dependencies into a single container, making installation and usage much easier.
Installing Docker
If you haven’t installed Docker yet, download and install it from the official website: https://www.docker.com/get-started
Once Docker is installed, you can proceed with setting up FlareSolverr.
Downloading and Running FlareSolverr
Run the following command in your terminal to download the latest FlareSolverr image:
docker pull 21hsmw/flaresolverr:nodriver
Once the image is downloaded, create and start a container by running:
docker run -d - name flaresolverr -p 8191:8191 21hsmw/flaresolverr:nodriver
This command does the following:
- -d runs the container in detached mode (in the background).
- — name flaresolverr names the container.
- -p 8191:8191 sets up the port for FlareSolverr.
Verifying FlareSolverr Installation
To check if FlareSolverr is running, open your browser and visit:
If everything is set up correctly, you will see a confirmation message that FlareSolverr is running.
Scraping Data with FlareSolverr
FlareSolverr works by handling requests for you. Instead of sending requests directly to a website, you send them to FlareSolverr, which processes them and returns the response.
Here’s how to scrape a Cloudflare-protected website using Python:
Step 1: Install Required Libraries
First, install the requests and BeautifulSoup libraries if you haven’t already:
pip install requests bs4
Step 2: Write the Scraping Script
Create a new Python file and add the following script:
import requests
# Define FlareSolverr URL
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
# Define request parameters
data = {
"cmd": "request.get",
"url": "https://www.example.com",
"maxTimeout": 60000
}
# Send request to FlareSolverr
response = requests.post(url, headers=headers, json=data)
# Print response details
print("Status:", response.json().get('status', {}))
print("Status Code:", response.status_code)
print("FlareSolverr Message:", response.json().get('message', {}))
# Extract page content
page_content = response.json().get('solution', {}).get('response', '')
print(page_content)
This script:
- Sends a request to FlareSolverr to bypass Cloudflare.
- Waits for the challenge to be solved.
- Returns the HTML content of the page.
Parsing the Scraped Data
Once you have the page content, you can extract specific data using BeautifulSoup.
Modify your script to extract information:
from bs4 import BeautifulSoup
# Parse HTML content
soup = BeautifulSoup(page_content, 'html.parser')
# Extract specific data
title = soup.find('title').text
print("Page Title:", title)
This will extract and display the page title from the scraped webpage.
Using Proxies with FlareSolverr
Using proxies helps you avoid IP bans and geographical restrictions.
Adding a Proxy to FlareSolverr
You can use a proxy by modifying the request payload:
import random
proxy_list = [
'185.150.85.170',
'45.154.194.148',
'104.244.83.140'
]
proxy_ip = random.choice(proxy_list)
proxies = {
'http': f'http://{proxy_ip}',
'https': f'https://{proxy_ip}',
}
data = {
"cmd": "request.get",
"url": "https://www.example.com",
"maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data, proxies=proxies)
print("Status Code:", response.status_code)
This script selects a random proxy from the list and uses it for the request.
Managing Sessions and Cookies
Cloudflare uses cookies to track user sessions. FlareSolverr can store and reuse cookies, improving scraping efficiency.
Creating a Session
To create a session:
data = {
"cmd": "sessions.create"
}
response = requests.post(url, headers=headers, json=data)
session_id = response.json().get('session', {})
print("Session ID:", session_id)
Using the Session
Once you have a session ID, use it for subsequent requests:
data = {
"cmd": "request.get",
"url": "https://www.example.com",
"maxTimeout": 60000,
"session": session_id
}
response = requests.post(url, headers=headers, json=data)
print(response.json())
Troubleshooting Common Issues
Here are common FlareSolverr issues and how to fix them:
FlareSolverr is not running
Check if the Docker container is running:
docker ps
Restart the container:
docker restart flaresolverr
Request fails with CAPTCHA challenge
- Use rotating proxies.
- Reduce request frequency to mimic real user behavior.
Cookies not working
Ensure you are using session-based scraping.
FlareSolverr Alternatives
If FlareSolverr does not work for your use case, consider these alternatives:
- Playwright & Puppeteer — Automate browser interactions. Learn more about web scraping with Playwright and Puppeteer.
- Web Scraper APIs — Use web scraping APIs to fully automate the whole scraping infrastructure.
- Scraping Browsers — Tools like ScrapingBee provide built-in Cloudflare bypass.
Conclusion
FlareSolverr is a powerful tool for bypassing Cloudflare challenges and scraping protected websites. By using Docker, proxies, and session management, you can efficiently collect data without getting blocked. If you need a more robust scraping solution, consider using alternative tools like Playwright, Puppeteer, or Scraping APIs. Happy Scraping!