Scrapper-Go

Scrapper-Go is a powerful and flexible Go application that acts as a wrapper around Playwright, enabling you to define and execute web scraping pipelines using simple YAML configuration files. It provides a robust engine for automating browser interactions, extracting data, and handling various web scenarios.

Features

YAML-driven Scraping: Define complex scraping workflows using intuitive YAML configurations.
Playwright Integration: Leverages the full power of Playwright for browser automation, supporting Chromium, Firefox, and WebKit.
API Server: Expose your scraping capabilities as a RESTful API endpoint.
Interactive Shell: Interact with the scrapper in a live shell environment for testing and development.
Dependency Management: Easily install Playwright browsers and drivers with a dedicated setup command.

Installation

Prerequisites

Go (1.18 or higher)
Node.js (for Playwright dependencies)

Build from Source

Clone the repository:

git clone https://github.com/fmotalleb/scrapper-go.git
cd scrapper-go

Install Playwright dependencies:

go run main.go setup

You can specify which browsers to install:

go run main.go setup --browsers chromium,firefox

Or skip browser installation:

go run main.go setup --skip-browsers

Build the application:
```
go build -o scrapper-go .
```

Usage

Executing a Scraping Pipeline

You can see Documentation for more information on pipelines and logics.

You can run a YAML-defined scraping pipeline directly:

./scrapper-go -c path/to/your/config.yaml

Example config.yaml:

# Your YAML scraping configuration here

Subcommands

`serve` - Start the API Server

Run Scrapper-Go as an API service. By default, it listens on 127.0.0.1:8080. Note: This application does not support authentication. It is recommended to run it behind a reverse proxy for production use.

./scrapper-go serve
# Or specify address and port
./scrapper-go serve -a 0.0.0.0 -p 8081

For API usage see Api Documentation (ai generated might be slope, look at the code for actual implementation).

`setup` - Install Playwright Dependencies

As described in the installation section, this command helps manage Playwright's browsers and drivers.

./scrapper-go setup --browsers webkit

`shell` - Interactive Shell

Start an interactive shell for direct interaction and testing of scraping steps.

./scrapper-go shell

Configuration

Scrapper-Go looks for a configuration file named .scrapper-go.yaml in your home directory by default. You can specify a different configuration file using the -c or --config flag.

Docs are generated using gemini so there will be hiccups somewhere. I wont be writing any docs manually because this software is used to bypass js challenge on our internal hot-spot login page and some minor scrapping situations, this is mostly an experiment :).

Contributing

We welcome contributions! Please see CONTRIBUTING.md (if available) for details on how to contribute.

License

This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.github		.github
.vscode		.vscode
cmd		cmd
config		config
engine		engine
example		example
log		log
query		query
server		server
session		session
shell		shell
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
API_DOCUMENTATION.md		API_DOCUMENTATION.md
DOCUMENTATION.md		DOCUMENTATION.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapper-Go

Features

Installation

Prerequisites

Build from Source

Usage

Executing a Scraping Pipeline

Subcommands

`serve` - Start the API Server

`setup` - Install Playwright Dependencies

`shell` - Interactive Shell

Configuration

Contributing

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

fmotalleb/scrapper-go

Folders and files

Latest commit

History

Repository files navigation

Scrapper-Go

Features

Installation

Prerequisites

Build from Source

Usage

Executing a Scraping Pipeline

Subcommands

serve - Start the API Server

setup - Install Playwright Dependencies

shell - Interactive Shell

Configuration

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`serve` - Start the API Server

`setup` - Install Playwright Dependencies

`shell` - Interactive Shell

Packages