Crawly is a simple web-crawler i made
- Multi-threaded crawling
robots.txtsupport- Crawl depth control
- Polite delay between requests
- Domain-restricted crawling
- JSON output
- CLI options with
--verboseand--version
- C++17 compatible compiler
- libcurl
- Gumbo Parser
- nlohmann/json.hpp
- cxxopts.hpp
g++ crawly.cpp -o crawly -I./include -lcurl -lgumbo -pthread -std=c++17./crawly --start-url https://example.com --max-pages 50 --threads 4 --verbose--start-url : Starting URL (required)
--max-pages : Maximum pages to crawl (default: 50)
--max-depth : Maximum link depth (default: 3)
--threads : Number of threads (default: 4)
--output : Output JSON file (default: crawly_results.json)
--delay : Delay between requests in ms (default: 200)
--verbose : Print detailed crawl info
-v, --version : Show version