Assume you have some .csv/.json files from running a few crawls. In here are some tools to generated a .pdf with a lot of statistics on the crawl.
Note that the evaluation may take several hours up to even a day, depending on how much crawl data you have!
-
Check that you're in the
ipfs-crawler/eval/directory and build the image withdocker build -t scriptkitty/ipfs-crawl-eval .. This takes some time and requires 4GB of space, since we use texlive-full to build a PDF in the end -- any contribution towards making this image smaller is appreciated, since it's not on our priority list. -
After the build, go to the parent directory (
ipfs-crawler/) and run the container with the provided script./run_docker_eval.sh. The script will automatically use the data inipfs-crawler/output_data_crawls, if you want to provide a custom data folder, just provide the absolute path:./run_docker_eval.sh /path/to/crawl/data
Especially the geoIP lookup can take up to 1-2 days, so this is best run on a dedicated server.
The run will populate the directories on the host and output a report.pdf in the end, containing the computed statistics, tables and plots.
For details on the generated files, see the descriptions in the READMEs: figures, tables.
In Ubuntu, these are the necessary packages:
r-base
python3
texlive-full
latexmk
r-base and python3 for computing statistics and plotting, telive-full and latexmk to build a report in the end.
Rscript -e "install.packages(c(\"data.table\", \"reshape2\", \"ggplot2\", \"scales\", \
\"tikzDevice\", \"stringr\", \"pbapply\", \"igraph\", \"jsonlite\", \"tidyr\"))"
pip3 install geoip2 numpy ip2location
The evaluation consists of several basic statistics, figures, tables and the report in the end. To build everything, simply issue
make all
which will output a report.pdf in the eval/ directory.
Figures are also outputted as .pngs to the figures/ directory, to build only them, for example, use
make plots