Web Scraping with SeleniumBase

Web Scraping with SeleniumBase and Python in 2025

In this article, I’ll walk you through SeleniumBase’s key features, show you how to use it for scraping, and share some practical tips for handling the web’s challenges today. Whether you’re new to scraping or looking to level up, SeleniumBase offers a straightforward way to get started.

What is SeleniumBase?

SeleniumBase is an open-source Python framework built on top of Selenium, designed to simplify web automation tasks, including testing and scraping. Unlike standard Selenium, which often requires extensive boilerplate code, SeleniumBase minimizes repetitive coding with built-in tools for data extraction, interaction automation, and anti-bot evasion.

Alternative Solution — Automated Web Scraping Tools

If you need to scrape at a large scale or complex websites with advanced anti-scraping technologies, I can suggest trying one of the following web scraping tools that my agency works with:

  1. Bright Data — Best overall for advanced scraping; features extensive proxy management and reliable APIs.
  2. Octoparse — User-friendly no-code tool for automated data extraction from websites.
  3. ScrapingBee — Developer-oriented API that handles proxies, browsers, and CAPTCHAs efficiently.
  4. Scrapy — Open-source Python framework ideal for data crawling and scraping tasks.
  5. ScraperAPI — Handles tough scrapes with advanced anti-bot technologies; great for developers.
  6. Apify — Versatile platform offering ready-made scrapers and robust scraping capabilities.

To learn more about each service, I recommend reading my full article about web scraping tools.

Now, let’s start with installing SeleniumBase.

Setting Up SeleniumBase

Before diving into web scraping, you need to set up SeleniumBase on your system. Here’s how:

Install Python: Make sure you have Python installed. If not, download it from the official Python website.

Install SeleniumBase: Use the following command to install SeleniumBase via pip:

pip3 install seleniumbase

Install WebDriver: SeleniumBase manages the WebDriver installation, but you can also manually download and place it in your PATH if preferred.

Verify the installation: To ensure SeleniumBase is set up correctly, run the following command to see available options:

seleniumbase - help

Now that you have SeleniumBase installed, let’s create a basic scraper.

Building a Basic Scraper with SeleniumBase

Creating a simple web scraper with SeleniumBase involves setting up a Python script to interact with the web page. In this example, we’ll scrape product details from an e-commerce demo site.

Here’s a step-by-step guide to building a basic scraper:

Create a new file called scraper.py in your project directory.

Write the following code to extract the full HTML content of the target page:

from seleniumbase import BaseCase
class Scraper(BaseCase):
def test_get_html(self):
self.open("https://www.scrapingcourse.com/ecommerce/")
page_html = self.get_page_source()
print(page_html)

Run the scraper using the pytest command:

pytest scraper.py -s

The above code opens the e-commerce page and extracts the HTML, which gets printed to the terminal. This serves as the foundation for more complex scraping tasks.

Extracting Specific Data

To make the scraper more useful, you’ll often need to target specific elements, such as product names, images, and URLs. SeleniumBase supports CSS selectors, XPath, and IDs to locate elements. Here’s how to extract different types of information:

Scraping Product Names

Identify the HTML structure of the product name, which in this example is inside an h2 tag with the class product-name. The code below demonstrates how to scrape the product names:

from seleniumbase import BaseCase
class Scraper(BaseCase):
def test_get_product_names(self):
self.open("https://www.scrapingcourse.com/ecommerce/")
product_names = self.find_elements("h2.product-name")
names = [product.text for product in product_names]
print(names)

Scraping Product Images

To scrape image URLs, locate the img tags with the class product-image. The following code extracts the image URLs:

from seleniumbase import BaseCase
class Scraper(BaseCase):
def test_get_image_urls(self):
self.open("https://www.scrapingcourse.com/ecommerce/")
image_elements = self.find_elements("img.product-image")
image_urls = [image.get_attribute("src") for image in image_elements]
print(image_urls)

Scraping Product Links

Similarly, extract product URLs from a tags with the class woocommerce-LoopProduct-link:

from seleniumbase import BaseCase
class Scraper(BaseCase):
def test_get_product_links(self):
self.open("https://www.scrapingcourse.com/ecommerce/")
link_elements = self.find_elements("a.woocommerce-LoopProduct-link")
links = [link.get_attribute("href") for link in link_elements]
print(links)

Automating Browser Interactions

In some cases, scraping static content isn’t enough, especially for websites that load content dynamically or require user interactions. SeleniumBase supports a range of browser actions, including clicking, scrolling, and form submission.

Example: Automating Form Submission

Suppose a site requires login credentials. Here’s a script to log in and scrape content:

from seleniumbase import BaseCase
class Scraper(BaseCase):
def test_login_and_scrape(self):
self.open("https://www.scrapingcourse.com/login")
self.type("#email", "[email protected]")
self.type("#password", "password")
self.click("button[type='submit']")
self.save_screenshot("after_login.png")

This script fills in login details, submits the form, and captures a screenshot after logging in. The save_screenshot function helps confirm that the scraper navigates the site correctly.

Avoiding Anti-Bot Measures

Websites often use various anti-bot measures, such as CAPTCHAs, IP blocking, or JavaScript challenges. SeleniumBase includes features that help bypass some of these measures, but there are limitations.

Using UC Mode

SeleniumBase’s UC (Undetected ChromeDriver) mode allows scrapers to mimic real user behavior by modifying browser fingerprints. Here’s how to enable it:

from seleniumbase import Driver
class Scraper(BaseCase):
def test_bypass_bot_protection(self):
driver = Driver(uc=True)
driver.open("https://www.scrapingcourse.com/antibot-challenge")
driver.uc_gui_click_captcha()
page_html = driver.get_page_source()
print(page_html)
driver.quit()

This script uses UC mode to bypass bot detection and handle CAPTCHA challenges. Note that while UC mode can help evade detection, it isn’t foolproof, especially for highly sophisticated anti-bot systems.

Proxy Configuration

Using proxies can prevent IP bans by rotating IP addresses. SeleniumBase allows you to configure proxies for enhanced anonymity:

pytest scraper.py - proxy=IP:PORT - proxy-type=http

This command sets up a proxy server, which SeleniumBase uses for all requests in the scraping session.

Advanced Techniques and Best Practices

To make your web scraper more robust and less likely to be blocked, follow these best practices:

  1. Use Random Delays: Introduce random sleep intervals between actions to simulate human browsing patterns.
  2. Rotate User Agents: Change the user-agent string to mimic different devices and browsers.
  3. Handle JavaScript Rendering: Use self.wait_for_element to ensure all dynamic content loads before scraping.
  4. Respect the Robots.txt File: Always check a website’s robots.txt file to understand what sections are allowed for scraping.

Limitations of SeleniumBase

While SeleniumBase is powerful, it has some drawbacks:

  • Detection in Headless Mode: Some websites can detect headless browsers.
  • Scale Limitations: It isn’t suitable for scraping large volumes of data across numerous pages due to speed constraints.
  • Anti-bot Adaptation: Being an open-source tool, anti-bot developers can update their algorithms to detect SeleniumBase.

Conclusion

Web scraping with SeleniumBase is a flexible and powerful way to gather data, automate tasks, and interact with websites. I find it especially appealing because it’s easy to use and offers strong automation features, making it ideal for both beginners and experienced developers. While there are some challenges, like anti-bot detection and limitations with scaling, using smart strategies like UC mode and proxy settings can greatly improve your success.

If you’re serious about web scraping, learning SeleniumBase is a must. As automation tools continue to evolve, being able to adapt and use these frameworks is key to staying ahead in the ever-changing world of web scraping.

Similar Posts