How to Implement Proxies with Undetected ChromeDriver
In this guide, I’ll show you how to set up proxies with Undetected ChromeDriver, including handling proxy authentication, rotating proxies, and making the process more efficient. Let’s dive in!
Why Use Proxies with Undetected ChromeDriver?
Proxies are essential when web scraping because they allow you to:
- Bypass IP Blocking: Websites often block IPs they detect as bots. You can avoid these blocks and maintain access to the website by using proxies.
- Avoid Rate Limiting: Proxies help bypass traffic limits that websites impose on a single IP, preventing your scraper from getting restricted.
- Maintain Anonymity: Proxies obscure your real IP, ensuring your scraping activity remains anonymous and undetected.
Setting Up Proxies with Undetected ChromeDriver
To get started with Bright Data proxies and Undetected ChromeDriver, follow these simple steps:
Install Required Libraries
Before using Bright Data proxies with Undetected ChromeDriver, install the necessary libraries:
pip install undetected-chromedriver selenium
Configure Proxy in Chrome Options
Next, add your proxy settings to the Chrome options. You will use Selenium’s add_argument() method to configure the proxy.
Here’s how to set up a basic proxy:
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
# Define your Bright Data proxy details
proxy = "http://:@:"
if __name__ == "__main__":
# Set Chrome options
options = uc.ChromeOptions()
# Run Chrome in headless mode
options.headless = True
# Add proxy to Chrome options
options.add_argument(f" - proxy-server={proxy}")
# Create a Chrome instance
driver = uc.Chrome(
options=options,
use_subprocess=False,
)
# Visit the test URL to check your proxy IP
driver.get("https://httpbin.io/ip")
# Select the body tag containing the current IP address
ip_address = driver.find_element(By.TAG_NAME, "body").text
# Print your current IP
print(ip_address)
When you run the code, you should see the IP address of the proxy server, confirming that the proxy setup is working.
Handling Proxy Authentication with Bright Data
Bright Data requires authentication to use its proxies. You will need your username, password, proxy host, and port, which are all provided when you sign up for the service.
The authenticated proxy URL will look like this:
http://<username>:<password>@<proxy_ip>:<proxy_port>
Here’s how to set up an authenticated proxy using Selenium and Undetected ChromeDriver.
Install Selenium Wire
To manage proxy authentication, you need to use selenium-wire. It extends Selenium’s functionality and makes it easier to handle proxy authentication.
Install selenium-wire using this command:
pip install selenium-wire
Add Proxy Authentication
Now, you can use selenium-wire to pass your authentication credentials for Bright Data proxies:
import seleniumwire.undetected_chromedriver as uc
from selenium.webdriver.common.by import By
# Define your Bright Data proxy credentials
proxy_username = ""
proxy_password = ""
proxy_host = ""
proxy_port = ""
# Form the proxy address
proxy_address = f"http://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"
# Add the proxy address to proxy options
proxy_options = {
"proxy": {
"http": proxy_address,
"https": proxy_address,
}
}
if __name__ == "__main__":
# Set Chrome options
options = uc.ChromeOptions()
# Run Chrome in headless mode
options.headless = True
# Create a Chrome instance with the proxy options
driver = uc.Chrome(
seleniumwire_options=proxy_options,
options=options,
use_subprocess=False,
)
# Visit the test URL to check your proxy IP
driver.get("https://httpbin.io/ip")
# Select the body tag containing the current IP address
ip_address = driver.find_element(By.TAG_NAME, "body").text
# Print your current IP
print(ip_address)
This code authenticates your Bright Data proxy and is used for web scraping. When you visit the test URL, you should see the IP address of the proxy server.
Rotating Proxies with Bright Data
A single proxy isn’t enough for large-scale scraping. Websites may block your IP after too many requests. To avoid this, you need to rotate proxies, so you don’t rely on just one IP.
Rotating proxies helps you distribute requests across multiple IP addresses, mimicking different users’ behavior and reducing the chances of being blocked.
Here’s how to implement proxy rotation using Bright Data:
Define a List of Proxies
Create a list of Bright Data proxy addresses. You can get these from your Bright Data dashboard.
import itertools
import random
# Define a list of Bright Data proxies
proxy_pool = [
"http://:@:",
"http://:@:",
"http://:@:",
]
# Function to rotate proxies
def rotate_proxy(proxy_list):
random.shuffle(proxy_list)
return itertools.cycle(proxy_list)
Use the Rotated Proxy
Now, use the rotated proxy in your script:
# Create a proxy generator from the list
proxy_generator = rotate_proxy(proxy_pool)
if __name__ == "__main__":
# Set Chrome options
options = uc.ChromeOptions()
# Add the rotated proxy to Chrome options
options.add_argument(f" - proxy-server={next(proxy_generator)}")
# Run Chrome in headless mode
options.headless = True
# Create a Chrome instance
driver = uc.Chrome(
options=options,
use_subprocess=False,
)
# Visit the test URL to check your proxy IP
driver.get("https://httpbin.io/ip")
# Select the body tag containing the current IP address
ip_address = driver.find_element(By.TAG_NAME, "body").text
# Print the current IP address
print(ip_address)
This code uses a randomly selected proxy from the pool every time it runs, ensuring your IP changes with each request. Learn more about the difference between rotating and static proxies.
Premium Residential Proxies from Bright Data
Bright Data and others provide high-quality residential proxies, which are ideal for web scraping. These proxies are less likely to be blocked than free proxies because they are assigned to real residential users. This makes them more challenging for websites to detect.
With Bright Data, you get:
- Reliable Proxies: These proxies are rarely blocked because they use IPs from real users.
- Automatic Rotation: Bright Data automatically rotates proxies, so you don’t have to manually handle it in your code.
- Geo-targeting: You can target specific regions to make requests from a particular country or city, which is useful for localized scraping.
Integrating Bright Data Proxies with Undetected ChromeDriver
To use Bright Data residential proxies, you need to sign up for an account, generate proxy credentials, and use them in your Undetected ChromeDriver script.
Here’s how to integrate Bright Data proxies with your script:
import seleniumwire.undetected_chromedriver as uc
from selenium.webdriver.common.by import By
# Define your Bright Data proxy credentials
proxy_username = ""
proxy_password = ""
proxy_host = ""
proxy_port = ""
# Form the proxy address
proxy_address = f"http://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"
# Add the proxy address to proxy options
proxy_options = {
"proxy": {
"http": proxy_address,
"https": proxy_address,
}
}
if __name__ == "__main__":
# Set Chrome options
options = uc.ChromeOptions()
# Run Chrome in headless mode
options.headless = True
# Create a Chrome instance with the proxy options
driver = uc.Chrome(
seleniumwire_options=proxy_options,
options=options,
use_subprocess=False,
)
# Visit the test URL to check your proxy IP
driver.get("https://httpbin.io/ip")
# Select the body tag containing the current IP address
ip_address = driver.find_element(By.TAG_NAME, "body").text
# Print your current IP
print(ip_address)
When you run this code, it uses your Bright Data residential proxy and prints the IP address of the proxy server, ensuring that your web scraping is anonymous and unblocked.
Conclusion
Implementing proxies in your web scraping workflow is crucial for avoiding detection and improving the efficiency of your scraping tasks. Following the steps above, you can easily integrate residential proxies with Undetected ChromeDriver and start scraping without issues!