Scrapper-Go is a powerful and flexible Go application that acts as a wrapper around Playwright, enabling you to define and execute web scraping pipelines using simple YAML configuration files. It provides a robust engine for automating browser interactions, extracting data, and handling various web scenarios.
- YAML-driven Scraping: Define complex scraping workflows using intuitive YAML configurations.
- Playwright Integration: Leverages the full power of Playwright for browser automation, supporting Chromium, Firefox, and WebKit.
- API Server: Expose your scraping capabilities as a RESTful API endpoint.
- Interactive Shell: Interact with the scrapper in a live shell environment for testing and development.
- Dependency Management: Easily install Playwright browsers and drivers with a dedicated setup command.
- Go (1.18 or higher)
- Node.js (for Playwright dependencies)
-
Clone the repository:
git clone https://github.com/fmotalleb/scrapper-go.git cd scrapper-go -
Install Playwright dependencies:
go run main.go setup
You can specify which browsers to install:
go run main.go setup --browsers chromium,firefox
Or skip browser installation:
go run main.go setup --skip-browsers
-
Build the application:
go build -o scrapper-go .
You can see Documentation for more information on pipelines and logics.
You can run a YAML-defined scraping pipeline directly:
./scrapper-go -c path/to/your/config.yamlExample config.yaml:
# Your YAML scraping configuration hereRun Scrapper-Go as an API service. By default, it listens on 127.0.0.1:8080.
Note: This application does not support authentication. It is recommended to run it behind a reverse proxy for production use.
./scrapper-go serve
# Or specify address and port
./scrapper-go serve -a 0.0.0.0 -p 8081For API usage see Api Documentation (ai generated might be slope, look at the code for actual implementation).
As described in the installation section, this command helps manage Playwright's browsers and drivers.
./scrapper-go setup --browsers webkitStart an interactive shell for direct interaction and testing of scraping steps.
./scrapper-go shellScrapper-Go looks for a configuration file named .scrapper-go.yaml in your home directory by default. You can specify a different configuration file using the -c or --config flag.
Docs are generated using gemini so there will be hiccups somewhere. I wont be writing any docs manually because this software is used to bypass js challenge on our internal hot-spot login page and some minor scrapping situations, this is mostly an experiment :).
We welcome contributions! Please see CONTRIBUTING.md (if available) for details on how to contribute.
This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.