tokopedia-scrapers

tokopedia-scrapers

Description

This repository contains Python-based scrapers for extracting product listings and detailed product information from Tokopedia. These scrapers leverage the Crawlbase Crawling API to handle JavaScript rendering, CAPTCHA challenges, and anti-bot protections. The extracted data is processed using BeautifulSoup for HTML parsing and Pandas for structured storage.

➡ Read the full blog here to learn more.

Scrapers Overview

Tokopedia Product Listing Scraper

The Tokopedia Product Listing Scraper (tokopedia_listing_scraper.py) extracts:

Product Name
Price
Product URL
Shop Name

The scraper supports pagination, ensuring comprehensive data extraction. The extracted data is saved in a JSON file.

Tokopedia Product Detail Scraper

The Tokopedia Product Detail Scraper (tokopedia_product_scraper.py) extracts detailed product information, including:

Product Name
Store Name
Full Description
Price
Images URL

The extracted data is saved in a JSON file.

Environment Setup

Ensure that Python is installed on your system. Check the version using:

# Use python3 if required (for Linux/macOS)
python --version

Next, install the required dependencies:

pip install crawlbase beautifulsoup4

Crawlbase – Handles JavaScript rendering and bypasses bot protections.
BeautifulSoup – Parses and extracts structured data from HTML.

Running the Scrapers

Get Your Crawlbase Access Token

Sign up for Crawlbase here to get an API token.
Use the JS token for Tokopedia scraping, as the site uses JavaScript-rendered content.

Update the Scraper with Your Token

Replace "YOUR_CRAWLBASE_TOKEN" in the script with your Crawlbase JS Token.

Run the Scraper

# For product listing scraping
python tokopedia_listing_scraper.py

# For product detail scraping
python tokopedia_product_scraper.py

The scraped data will be saved in tokopedia_search_results.json or tokopedia_product_data.json, depending on the script used.

To-Do List

Expand scrapers to extract additional product details like discounted prices, seller reputation, and available promotions.
Optimize data storage and add support for CSV and database integration.
Implement asynchronous requests to speed up data extraction.
Enhance scraper efficiency with Crawlbase Smart Proxy to prevent blocks.
Automate scheduled scraping for real-time price monitoring and product tracking.

Why Use This Scraper?

✔ Bypasses anti-bot protections with Crawlbase.
✔ Handles JavaScript-rendered content seamlessly.
✔ Extracts accurate and structured product data efficiently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokopedia-scrapers

Description

Scrapers Overview

Tokopedia Product Listing Scraper

Tokopedia Product Detail Scraper

Environment Setup

Running the Scrapers

Get Your Crawlbase Access Token

Update the Scraper with Your Token

Run the Scraper

To-Do List

Why Use This Scraper?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
tokopedia_listing_scraper.py		tokopedia_listing_scraper.py
tokopedia_product_scraper.py		tokopedia_product_scraper.py

Folders and files

Latest commit

History

Repository files navigation

tokopedia-scrapers

Description

Scrapers Overview

Tokopedia Product Listing Scraper

Tokopedia Product Detail Scraper

Environment Setup

Running the Scrapers

Get Your Crawlbase Access Token

Update the Scraper with Your Token

Run the Scraper

To-Do List

Why Use This Scraper?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages