Get Structured Data From Popular Websites
In this article, I’ll show you how to scrape data from popular websites like Amazon, Walmart, and Zillow with Bright Data’s scraping APIs. Let’s dive in and see how it works.
Important note: I am not affiliated with Bright Data at all, I am just using the platform myself and I wanted to teach others.
Why Scraping Websites Can Be Challenging
Web scraping is a technique used to extract data from websites. In simple terms, it allows you to gather structured information (like product details, prices, reviews, etc.) from websites automatically. However, scraping websites comes with its challenges. Here’s why:
- Getting Blocked: Many popular websites use anti-bot measures to prevent scraping. If they detect that you’re scraping, they may block your IP address.
- Complex Websites: E-commerce and real estate sites often have complex structures, making extracting the data you need difficult.
- Constant Maintenance: Parsers (scripts that extract data) break frequently because websites change their layout or structure. This means you spend more time fixing these scripts than working on your project.
How to Easily Get Structured Data from Popular Websites?
Here’s how you can start scraping structured data from websites using Bright Data:
Sign Up and Set Up Your Account
To get started, visit the Bright Data website and sign up for an account. Once you sign up, you’ll gain access to their APIs catalog. The catalog includes different API options based on the website you want to scrape. Bright Data makes choosing the correct API for your use case easy.
Choose the Website API
After signing up, browse the available API options in Bright Data’s catalog. For example, if you want to scrape product data from Amazon, you can select the Amazon Product Information API.
Bright Data offers APIs for various sites, including:
- Amazon: Product details, reviews, price information, and more.
- Walmart: Product catalog, availability, and pricing.
- Zillow: Real estate listings, property details, prices, and location information.
- Idealista: Property data, listings, and rental information.
Paste the URL and Get the Code
Once you’ve chosen the API, you will be prompted to enter the URL or the specific product identifier (e.g., ASIN for Amazon products). After entering the URL, you can click to generate the code in your preferred programming language.
For example, if you want to scrape an Amazon product, you can paste the product URL into the box and then select your programming language (e.g., Python). Bright Data will generate a code snippet for you to copy.
Implement the Code
Once you have the generated code, you can implement it into your project. Below is an example Python code to scrape product data from Amazon using Bright Data’s API.
# pip3 install requests
import requests
# URL of the product to scrape
url = "https://www.amazon.com/Logitech-920-002478-K120-USB-Keyboard/dp/B003ELVLKU?th=1"
# Your Bright Data API key
apikey = ""
# Parameters to pass to the API
params = {
"apikey": apikey,
"url": url,
}
# Make a request to the Bright Data API
response = requests.get("https://ecommerce.api.brightdata.com/v1/targets/amazon/products/", params=params)
# Print the response text (structured data in JSON format)
print(response.text)
Get Structured Data
When you run the script, the Bright Data API will return the data in a structured format like JSON. Here’s an example of the data you might get back from Amazon:
{
"amazon_choice": true,
"availability_status": "In Stock",
"badge": "Amazon's Choice",
"brand": "Logitech",
"buybox_seller": "Amazon.com",
"category_breadcrumb": [
"Electronics",
"Computers & Accessories",
"Computer Accessories & Peripherals",
"Keyboards, Mice & Accessories",
"Keyboards"
],
"is_available": true,
"manufacturer": "Logitech",
"parent_asin": "B0CZXVN37Q",
"price_currency_code": "USD",
"price_currency_symbol": "$",
"product_description": "With comfortable, quiet typing, a sleek yet sturdy design…",
"product_images": [
"https://m.media-amazon.com/images/I/61j3wQheLXL._AC_SL1500_.jpg",
"https://m.media-amazon.com/images/I/61j3wQheLXL.__AC_SX300_SY300_QL70_FMwebp_.jpg"
],
"product_model_number": "920–002478",
"product_name": "Logitech K120 Wired Keyboard",
"product_price": 12.34,
"product_price_before_discount": 12.99,
"product_top_review": "Great keyboard for the price…",
"product_url": "https://www.amazon.com/Logitech-920-002478-K120-USB-Keyboard/dp/B003ELVLKU",
"rating_score": 4.6,
"review_count": 7888,
"sku": "B003ELVLKU"
}
As you can see, the data is structured clearly and organized. You can easily access product name, price, reviews, availability, images, and more in a JSON format. This makes it easier to analyze and integrate the data into your projects.
Benefits of Using Bright Data for Web Scraping
- Bypass Anti-Bot Measures: Bright Data allows you to scrape data from websites without worrying about being blocked. It handles proxies and CAPTCHAs for you.
- No Configuration Needed: Once you set up your API key, you can scrap data with minimal effort.
- Structured Data: Get data in a well-organized format (JSON) that’s easy to use and integrate.
- Supports Multiple Websites: Bright Data supports scraping data from a variety of websites, including Amazon, Walmart, Zillow, and Idealista.
- Save Time and Effort: Bright Data reduces the time spent maintaining parsers and dealing with broken scripts. You can focus on analyzing the data rather than fixing issues.
Use Cases for Structured Data
The structured data you extract from websites can be used in various applications. Here are some examples:
- Price Comparison: Compare prices of products across different e-commerce sites.
- Market Research: Collect data on product availability, pricing trends, and reviews for competitive analysis.
- Real Estate Analysis: Gather property details like prices, sizes, and locations for real estate projects.
- Product Review Aggregation: Collect product reviews and ratings from various sources to make informed purchasing decisions.
Conclusion
Scraping data from popular websites doesn’t have to be hard or take up a lot of time. With the right tools, like an API, you can easily gather structured data from sites like Amazon, Walmart, and Zillow without worrying about getting blocked or fixing broken parsers. The setup is simple; you get clean data in a well-organized JSON format. This makes it easy to use the data in your projects right away. Whether you’re working on e-commerce analysis, market research, or real estate, the right tool helps you gather the data quickly and accurately.