FlareSolverr Guide

FlareSolverr Guide: Scrape and Bypass Cloudflare

In this article, I’ll show you exactly how to set up and use FlareSolverr to scrape data without getting blocked. Let’s dive in!

What is FlareSolverr?

FlareSolverr is an open-source tool that helps bypass Cloudflare’s security challenges. Websites use Cloudflare to prevent bots from scraping data, but FlareSolverr can bypass these protections by acting as a real browser. It loads web pages like a human would, passing Cloudflare’s verification checks.

How FlareSolverr Works

FlareSolverr is a great solution for scraping data from websites that use Cloudflare protection.

  1. Acts as a Proxy: FlareSolverr runs as a proxy server that intercepts your requests and forwards them to the target website.
  2. Mimics a Real Browser: It launches a headless browser session to load the webpage and solve the Cloudflare challenge.
  3. Returns the Scraped Data: Once the challenge is solved, FlareSolverr sends back the page content so you can extract the data you need.

How to Install and Set Up FlareSolverr

The recommended way to install FlareSolverr is by using Docker. Docker helps package all dependencies into a single container, making installation and usage much easier.

Installing Docker

If you haven’t installed Docker yet, download and install it from the official website: https://www.docker.com/get-started

Once Docker is installed, you can proceed with setting up FlareSolverr.

Downloading and Running FlareSolverr

Run the following command in your terminal to download the latest FlareSolverr image:

docker pull 21hsmw/flaresolverr:nodriver

Once the image is downloaded, create and start a container by running:

docker run -d - name flaresolverr -p 8191:8191 21hsmw/flaresolverr:nodriver

This command does the following:

  • -d runs the container in detached mode (in the background).
  • — name flaresolverr names the container.
  • -p 8191:8191 sets up the port for FlareSolverr.

Verifying FlareSolverr Installation

To check if FlareSolverr is running, open your browser and visit:

http://localhost:8191

If everything is set up correctly, you will see a confirmation message that FlareSolverr is running.

Scraping Data with FlareSolverr

FlareSolverr works by handling requests for you. Instead of sending requests directly to a website, you send them to FlareSolverr, which processes them and returns the response.

Here’s how to scrape a Cloudflare-protected website using Python:

Step 1: Install Required Libraries

First, install the requests and BeautifulSoup libraries if you haven’t already:

pip install requests bs4

Step 2: Write the Scraping Script

Create a new Python file and add the following script:

import requests
# Define FlareSolverr URL
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
# Define request parameters
data = {
"cmd": "request.get",
"url": "https://www.example.com",
"maxTimeout": 60000
}
# Send request to FlareSolverr
response = requests.post(url, headers=headers, json=data)
# Print response details
print("Status:", response.json().get('status', {}))
print("Status Code:", response.status_code)
print("FlareSolverr Message:", response.json().get('message', {}))
# Extract page content
page_content = response.json().get('solution', {}).get('response', '')
print(page_content)

This script:

  • Sends a request to FlareSolverr to bypass Cloudflare.
  • Waits for the challenge to be solved.
  • Returns the HTML content of the page.

Parsing the Scraped Data

Once you have the page content, you can extract specific data using BeautifulSoup.

Modify your script to extract information:

from bs4 import BeautifulSoup
# Parse HTML content
soup = BeautifulSoup(page_content, 'html.parser')
# Extract specific data
title = soup.find('title').text
print("Page Title:", title)

This will extract and display the page title from the scraped webpage.

Using Proxies with FlareSolverr

Using proxies helps you avoid IP bans and geographical restrictions.

Adding a Proxy to FlareSolverr

You can use a proxy by modifying the request payload:

import random
proxy_list = [
'185.150.85.170',
'45.154.194.148',
'104.244.83.140'
]
proxy_ip = random.choice(proxy_list)
proxies = {
'http': f'http://{proxy_ip}',
'https': f'https://{proxy_ip}',
}
data = {
"cmd": "request.get",
"url": "https://www.example.com",
"maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data, proxies=proxies)
print("Status Code:", response.status_code)

This script selects a random proxy from the list and uses it for the request.

Managing Sessions and Cookies

Cloudflare uses cookies to track user sessions. FlareSolverr can store and reuse cookies, improving scraping efficiency.

Creating a Session

To create a session:

data = {
"cmd": "sessions.create"
}
response = requests.post(url, headers=headers, json=data)
session_id = response.json().get('session', {})
print("Session ID:", session_id)

Using the Session

Once you have a session ID, use it for subsequent requests:

data = {
"cmd": "request.get",
"url": "https://www.example.com",
"maxTimeout": 60000,
"session": session_id
}
response = requests.post(url, headers=headers, json=data)
print(response.json())

Troubleshooting Common Issues

Here are common FlareSolverr issues and how to fix them:

FlareSolverr is not running

Check if the Docker container is running:

docker ps

Restart the container:

docker restart flaresolverr

Request fails with CAPTCHA challenge

Cookies not working

Ensure you are using session-based scraping.

FlareSolverr Alternatives

If FlareSolverr does not work for your use case, consider these alternatives:

  • Playwright & Puppeteer — Automate browser interactions. Learn more about web scraping with Playwright and Puppeteer.
  • Web Scraper APIs — Use web scraping APIs to fully automate the whole scraping infrastructure.
  • Scraping Browsers — Tools like ScrapingBee provide built-in Cloudflare bypass.

Conclusion

FlareSolverr is a powerful tool for bypassing Cloudflare challenges and scraping protected websites. By using Docker, proxies, and session management, you can efficiently collect data without getting blocked. If you need a more robust scraping solution, consider using alternative tools like Playwright, Puppeteer, or Scraping APIs. Happy Scraping!

Similar Posts