Selenium in Ruby for Web Scraping Guide

Selenium in Ruby for Web Scraping Guide

In this guide, I’ll show you how to set up Selenium with Ruby, scrape data, and handle dynamic content. You’ll have a working web scraper to collect data from real websites by the end.

If you’re looking for alternatives to building your own web scrapers, check out our Best Web Scraping Tools article. Discover top-rated tools that can simplify your data extraction projects.

Let’s get started!

Why Use Selenium for Web Scraping?

Web scraping involves extracting data from web pages automatically. While basic scraping can be done with libraries like Nokogiri, some websites use JavaScript to load content dynamically. Traditional scrapers struggle to retrieve data from such sites.

This is where Selenium helps. It allows Ruby scripts to control real browsers, interact with JavaScript-based websites, and extract the required data. You can also learn how to scrape websites with Selenium and PHP in this article.

Some advantages of Selenium for web scraping:

  • Handles dynamic content: It can interact with JavaScript-based pages.
  • Mimics human behavior: Selenium can click buttons, scroll, and fill forms.
  • Cross-browser support: Works with Chrome, Firefox, Edge, and other browsers.
  • Flexible: Supports multiple programming languages, including Ruby.

Setting Up Selenium in Ruby

Step 1: Install Selenium for Ruby

Install Ruby

First, check if you have Ruby installed. Open a terminal and type:

ruby -v

If Ruby is not installed, download it from ruby-lang.org.

Install Required Gems

Selenium requires the selenium-webdriver gem. Install it using:

gem install selenium-webdriver

You are now ready to start scraping.

Step 2: Create a Selenium Web Scraper

Set Up a Ruby Project

Create a new folder for your project:

mkdir selenium-ruby-scraper
cd selenium-ruby-scraper

Inside the folder, create a new Ruby file:

touch scraper.rb

Now, open scraper.rb in your preferred text editor.

Initialize Selenium WebDriver

In scraper.rb, write the following code:

require "selenium-webdriver"
# Set up browser options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument(" - headless") # Run in headless mode (no GUI)
# Initialize WebDriver
driver = Selenium::WebDriver.for :chrome, options: options
# Open the target website
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"
# Get and print page source
puts driver.page_source
# Close the browser
driver.quit

Save the file and run it:

ruby scraper.rb

If everything is correct, the script will output the page’s HTML in the terminal.

Step 3: Extract Data from the Webpage

Now, let’s extract specific information. The target webpage contains product listings. Each product has a name and a price.

Modify scraper.rb:
require "selenium-webdriver"
# Define Product structure
Product = Struct.new(:name, :price)
# Set up browser options
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument(" - headless")
# Initialize WebDriver
driver = Selenium::WebDriver.for :chrome, options: options
driver.navigate.to "https://scrapingclub.com/exercise/list_infinite_scroll/"
# Find all products
products = []
html_products = driver.find_elements(:css, ".post")
# Extract product details
html_products.each do |html_product|
name = html_product.find_element(:css, "h4").text
price = html_product.find_element(:css, "h5").text
products << Product.new(name, price)
end
# Print extracted data
products.each { |product| puts "Name: #{product.name}, Price: #{product.price}" }
# Close the browser
driver.quit

Run the script again:

ruby scraper.rb

Now, you should see product names and prices in the terminal.

Step 4: Handle Infinite Scrolling

Some websites use infinite scrolling, meaning content loads as the user scrolls. Selenium allows us to mimic this behavior.

Modify scraper.rb:
# Scroll down to load more products
10.times do
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(1) # Allow time for content to load
end
# Wait for the last product to load
wait = Selenium::WebDriver::Wait.new(timeout: 10)
wait.until { driver.find_element(:css, ".post:nth-child(60)") }

Step 5: Save Data to CSV

Extracted data is more useful if stored in a structured format like CSV.

Modify scraper.rb:
require "csv"
# Save data to CSV
CSV.open("products.csv", "wb", write_headers: true, headers: ["Name", "Price"]) do |csv|
products.each { |product| csv << product }
end
puts "Data saved to products.csv"

Run the script again, and you will find a products.csv file in your project folder.

Step 6: Use a Proxy for Anonymity

Some websites block scrapers by detecting multiple requests from the same IP. Using a proxy can help avoid bans.

Modify scraper.rb:
# Set up proxy
proxy = "http://72.10.160.174:22669"
options.add_argument(" - proxy-server=#{proxy}")

This will route requests through the specified proxy.

Step 7: Avoid Getting Blocked

To avoid being blocked:

Example of setting a user agent:

user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
options.add_argument(" - user-agent=#{user_agent}")
For a scraping API, use:
driver.navigate.to "https://api.brightdata.com/v1/?apikey=YOUR_API_KEY&url=https://targetsite.com"

Conclusion

Selenium is a powerful tool for automating web interactions and extracting data. However, scraping responsibly is important — respect website terms of service, avoid overloading servers, and use APIs when available.

Now that you have mastered Selenium in Ruby, try scraping different websites and experiment with advanced interactions!

Similar Posts