crawly

Crawly is a simple web-crawler i made

Features

Multi-threaded crawling
robots.txt support
Crawl depth control
Polite delay between requests
Domain-restricted crawling
JSON output
CLI options with --verbose and --version

Requirements

C++17 compatible compiler
libcurl
Gumbo Parser
nlohmann/json.hpp
cxxopts.hpp

Build

g++ crawly.cpp -o crawly -I./include -lcurl -lgumbo -pthread -std=c++17

Usage

./crawly --start-url https://example.com --max-pages 50 --threads 4 --verbose

CLI options

--start-url : Starting URL (required)

--max-pages : Maximum pages to crawl (default: 50)

--max-depth : Maximum link depth (default: 3)

--threads : Number of threads (default: 4)

--output : Output JSON file (default: crawly_results.json)

--delay : Delay between requests in ms (default: 200)

--verbose : Print detailed crawl info

-v, --version : Show version

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
include		include
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawly.cc		crawly.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crawly

Features

Requirements

Build

Usage

CLI options

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

License

owgydz/crawly

Folders and files

Latest commit

History

Repository files navigation

crawly

Features

Requirements

Build

Usage

CLI options

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages