How to Set Up Proxies in PuppeteerSharp
Here, I’ll show you how to find and set up a proxy with PuppeteerSharp and go over how to use rotating proxies for smoother, more effective scraping in 2024. Whether you’re new to scraping or looking to improve, these steps will help you get started easily!
Why Use Proxies with PuppeteerSharp?
Proxies are intermediaries between your code and the target website, effectively masking your IP address. By doing this, they allow you to bypass some of the security measures websites put in place to prevent bot access. Here are some specific benefits of using proxies with PuppeteerSharp:
- IP Masking: Proxies allow you to browse anonymously by hiding your original IP address.
- Geolocation: You can access region-specific data by choosing proxies from different geographical locations.
- Rate Limiting Circumvention: By rotating proxies, you can prevent IP blocks and access content without hitting the same address too frequently.
Best Proxies to Use for Large Projects
For large projects, I recommend using residential proxies. These proxies are based on real user IPs and are rotatable, making them the perfect choice for most project types, especially web scraping.
Here’s a list of the best residential proxies to use (prices don’t include enterprise plans):
- Bright Data — Largest provider, precise targeting, Proxy Manager tool, starting at $5.88/GB
- Oxylabs — Extensive network, precise targeting, dedicated support, starting at $6.98/GB
- Smartproxy — Large pool, broad locations, self-service, starting at $4.5/GB
- Webshare — Customization options, self-service, affordable, starting at $5.5/GB
- SOAX — Flexible rotation, precise targeting, 24/7 support, starting at $4/GB
Step 1: Setting Up PuppeteerSharp
Before configuring proxies, let’s set up a simple PuppeteerSharp project. PuppeteerSharp is a .NET library that provides a headless browser interface for web scraping and automation, based on Google Chrome’s headless browser capabilities.
- Create a Console Project: Create a new console project in your C# environment.
Install PuppeteerSharp: Use the command below in your terminal to install PuppeteerSharp:
dotnet add package PuppeteerSharp
Basic PuppeteerSharp Setup: Below is a minimal PuppeteerSharp setup to fetch your IP address using an HTTP request:
using PuppeteerSharp;
using System;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(
new LaunchOptions { Headless = true });
await using var page = await browser.NewPageAsync();
await page.GoToAsync("https://httpbin.io/ip");
var pageContent = await page.GetContentAsync();
Console.WriteLine(pageContent);
await browser.CloseAsync();
}
}
This code launches a headless browser, navigates to the httpbin API, and prints the returned IP address.
Step 2: Configuring a Proxy with PuppeteerSharp
To hide your actual IP address or avoid rate limits, configure a proxy in PuppeteerSharp. Here’s how:
Get a Proxy: For this example, you can use a free HTTP proxy from Free Proxy List (for testing). For production-level projects, consider premium proxy services for reliability and security.
Define Proxy Options: In PuppeteerSharp, you can set up proxy details using the Args parameter in LaunchOptions. Update your previous code to specify a proxy server:
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { " - proxy-server=<PROXY_IP_ADDRESS>:<PROXY_PORT>" }
});
Replace <PROXY_IP_ADDRESS>:<PROXY_PORT> with your actual proxy details. For instance, a proxy might look like 8.219.97.248:80.
After configuring the proxy, your IP should reflect the proxy’s IP when running the script.
Step 3: Proxy Authentication for Premium Proxies
Premium proxies often require authentication through a username and password, essential for accessing high-quality, secure proxies. On a page instance, you can add proxy credentials in PuppeteerSharp using the AuthenticateAsync method.
Here’s how to modify your code to add proxy authentication:
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { " - proxy-server=<PROXY_IP_ADDRESS>:<PROXY_PORT>" }
});
await using var page = await browser.NewPageAsync();
await page.AuthenticateAsync(new Credentials
{
Username = "<YOUR_USERNAME>",
Password = "<YOUR_PASSWORD>"
});
await page.GoToAsync("https://httpbin.io/ip");
var pageContent = await page.GetContentAsync();
Console.WriteLine(pageContent);
await browser.CloseAsync();
Replace <YOUR_USERNAME> and <YOUR_PASSWORD> with your premium proxy credentials.
Step 4: Rotating Proxies in PuppeteerSharp
When scraping larger amounts of data, rotating proxies can prevent detection and reduce the chances of being blocked. By switching between several proxy IPs, each request appears to come from a different location, helping you stay under the radar.
Define a Proxy List: Start by creating a list of proxies you can source from sites like Free Proxy List.
var proxies = new List<string>
{
"http://34.140.70.242:8080",
"http://118.69.111.51:8080",
"http://15.204.161.192:18080",
"http://186.121.235.66:8080",
};
Select a Random Proxy: With each request, randomize your proxy selection. The code below picks a random proxy from the list and launches PuppeteerSharp with it.
var random = new Random();
int randomIndex = random.Next(proxies.Count);
string randomProxy = proxies[randomIndex];
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { $" - proxy-server={randomProxy}" }
});
Make Multiple Requests: To observe the rotation, you can loop through requests, each time selecting a different proxy.
Step 5: Real-World Scenarios with Rotating Proxies
Many commercial websites, such as G2, have advanced anti-bot mechanisms and are protected by services like Cloudflare. Free proxies are less effective in such cases, as these websites quickly block them.
Consider using residential proxies instead of free ones to scrap more challenging targets. Residential proxies use real IP addresses associated with residential users, making them less likely to be flagged.
If you’re implementing a proxy rotator in a real-world scenario, your code structure would look like this:
using PuppeteerSharp;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
var proxies = new List<string>
{
"http://34.140.70.242:8080",
"http://118.69.111.51:8080",
"http://15.204.161.192:18080",
"http://186.121.235.66:8080",
};
var random = new Random();
int randomIndex = random.Next(proxies.Count);
string randomProxy = proxies[randomIndex];
var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new[] { $" - proxy-server={randomProxy}" }
});
await using var page = await browser.NewPageAsync();
await page.GoToAsync("https://httpbin.io/ip");
var pageContent = await page.GetContentAsync();
Console.WriteLine(pageContent);
await browser.CloseAsync();
}
}
Conclusion
Managing proxies carefully makes PuppeteerSharp a powerful tool for web scraping while reducing the risk of getting blocked. I found that free proxies can be useful for testing and simple tasks, but most serious applications need premium proxies that can rotate. Whether I am collecting market data or creating automation tools, learning to use proxies with PuppeteerSharp has opened up many opportunities for reliable data gathering. With the right setup, you can gather valuable insights without facing restrictions.