How to Handle Pagination With Selenium WebDriver
In this guide, I’ll walk you through tackling pagination using Selenium WebDriver with Python. We’ll look at different methods, from scraping websites with simple navigation bars to dealing with more complex setups like infinite scrolling and “Load More” buttons. Whether you’re scraping product listings or any other type of data, understanding how to handle pagination will help you scrape the entire website efficiently. Let’s dive in!
Easier Pagination With Bright Data’s Web Scraper API
Handling pagination with Selenium can be slow and prone to blocks. Bright Data’s Web Scraper API automates the process, handling pagination, proxies, and CAPTCHAs for you.
✅ Auto-pagination — No need to click “Next” or modify URLs
✅ Bypass blocks — Built-in proxy rotation & CAPTCHA solving
✅ Faster & scalable — Extract structured data without loading full pages
I am not affiliated with any of the providers mentioned here!
Handling Pagination with Navigation Bars
The most common form of pagination is a navigation bar, where the user can click “Next” or jump to a specific page using page numbers. You can scrape data from these pages using Selenium by automating clicks on the “Next” page button or modifying the URL’s page number.
Scraping Data with the “Next Page” Button
Let’s say you’re scraping product data from a website like ScrapingCourse.com, which has a simple navigation bar. You can automate clicks on the “Next” button to scrape multiple pages.
Step-by-Step Implementation
Open the Website and Inspect the Page Structure
First, load the webpage using Selenium, and inspect the HTML structure to identify the elements containing the data you want to scrape. Typically, products are included in specific HTML tags or classes.
Extract the Data from the First Page
Use Selenium to extract data from the first page. For example, you might extract product names and prices:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument(" - headless")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
product_containers = driver.find_elements(By.CLASS_NAME, "woocommerce-LoopProduct-link")
for product in product_containers:
name = product.find_element(By.CLASS_NAME, "woocommerce-loop-product__title")
price = product.find_element(By.CLASS_NAME, "price")
print(name.text)
print(price.text)
driver.quit()
Implement Pagination Logic
Next, add logic to click the “Next” button and scrape data from subsequent pages. Use a while loop to continue scraping until no more pages are available:
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
chrome_options = Options()
chrome_options.add_argument(" - headless")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://www.scrapingcourse.com/ecommerce/")
def scraper():
product_containers = driver.find_elements(By.CLASS_NAME, "woocommerce-LoopProduct-link")
for product in product_containers:
name = product.find_element(By.CLASS_NAME, "woocommerce-loop-product__title")
price = product.find_element(By.CLASS_NAME, "price")
print(name.text)
print(price.text)
while True:
scraper()
try:
next_page = driver.find_element(By.CLASS_NAME, "next.page-numbers")
next_page.click()
except NoSuchElementException:
print("No more pages available.")
break
driver.quit()
This approach automatically clicks the “Next” button to scrape all pages until it reaches the end.
Scraping by Changing the URL Page Number
Some websites encode the page number directly into the URL. Instead of clicking the “Next” button, you can change the page number in the URL and reload the page to scrape the next data set.
For instance, if the URL is https://www.scrapingcourse.com/ecommerce/page/1, you can increment the page number and load the new page:
page_count = 1
max_page_count = 13
for page in range(1, max_page_count 1):
driver.get(f"https://www.scrapingcourse.com/ecommerce/page/{page}")
scraper()
This method is straightforward but requires knowing the structure of the website’s URLs.
Handling JavaScript Pagination
Many websites use JavaScript-based pagination, which loads more content dynamically. This type of pagination typically includes features like infinite scrolling or “Load More” buttons.
Infinite Scrolling Pagination
Websites with infinite scrolling automatically load more content when the user scrolls to the bottom of the page. To scrape data from such pages, you need to simulate scrolling in your Selenium script.
Step-by-Step Implementation
Set Up the WebDriver
Set up your WebDriver in headless mode to scrape without opening a browser window:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
chrome_options = Options()
chrome_options.add_argument(" - headless")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://scrapingclub.com/exercise/list_infinite_scroll/")
Scroll Down and Scrape Data
Use JavaScript to scroll down to the bottom of the page. After each scroll, wait for new content to load, and then scrape the page:
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
# Now, scrape the content
products = driver.find_elements(By.CLASS_NAME, "post")
for product in products:
name = product.find_element(By.CSS_SELECTOR, "h4 a").text
price = product.find_element(By.CSS_SELECTOR, "h5").text
print(f"Name: {name}")
print(f"Price: {price}")
This method simulates scrolling and waits for new products to load.
Load More Button Pagination
Some websites use a “Load More” button to reveal more content when clicked. This is another form of JavaScript pagination.
Step-by-Step Implementation
Set Up the WebDriver
Open the website using Selenium and define the scraper function to extract product names and IDs:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome()
driver.get("https://www.3m.com.au/3M/en_AU/p/c/medical/")
driver.implicitly_wait(10)
Click the “Load More” Button
Use a loop to click the “Load More” button and scrape content after each click:
def scraper():
product_containers = driver.find_elements(By.CLASS_NAME, "sps2-content")
for product in product_containers:
name = product.find_element(By.CLASS_NAME, "sps2-content_name")
item_id = product.find_element(By.CLASS_NAME, "sps2-content_data - number")
print(f"Name: {name.text}")
print(f"Product ID: {item_id.text}")
scroll_count = 5
for _ in range(scroll_count):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
load_more = driver.find_element(By.CLASS_NAME, "mds-button_tertiary - medium")
load_more.click()
scraper()
This method simulates clicking the “Load More” button multiple times, allowing you to scrape data as the page loads more content.
Avoiding Detection When Scraping
Web scraping can trigger anti-bot measures, which may block your IP address or prevent you from scraping content. To avoid getting blocked, here are some strategies:
1. Use Proxies
Rotating proxies help you avoid detection by masking your IP address. Services like Bright Data offer proxy rotation along with other tools like CAPTCHA bypass.
2. Mimic Real User Behavior
Using techniques like rotating user agents, adding random delays between actions, and using headless browsers, you can make your scraping behavior appear more like a real user’s.
3. Use Web Scraping APIs
If scraping is blocked, consider using a web scraping API. They automatically handles proxies, CAPTCHA, and other anti-bot measures, making it easier to scrape content without facing blocks. View my full list of the best scraping APIs.
Conclusion
Pagination is a key part of web scraping, and with Selenium, you can easily manage different types like navigation bars, page numbers, infinite scrolling, and “Load More” buttons. These methods help you scrape data from websites with multiple pages smoothly.
It’s also important to take care of anti-bot measures. To avoid being blocked, rotate user agents, use proxies, and add delays between actions to mimic human browsing behavior. With these techniques, you’ll be able to scrape data efficiently without any trouble. Happy scraping!