In this repository, we explain how to train a Bicephalous Convolutional
AutoEncoder model enabling variable compression ratio for Sparse data
(BCAE-VS). The motivation for designing the model is to compress highly
sparse data collected from the time project chamber (TPC), the main
tracking device in colliders.
The key feature of BCAE-VS is that it compresses data not by
downsizing the input tensor to a uniform (smaller) size but by
downsampling the nonzero values (signal) in the sparse input.
More specifically, the encoder of BCAE-VS tags each signal
with an importance score, and only those with high importance will
be saved and used later for decoding. The compression scheme of
BCAE-VS implies the sparser the input, the smaller the compressed
data.
The encoder part of BCAE-VS is implemented with sparse convolution
which utilizes the sparsity of the input by avoiding
matrix multiplications with all-zero operands. In this study,
we use the MinkowskiEngine's
implementation for sparse convolution kernels.
In the remainder of this read-me file, we will show how to install
the MinkowskiEngine sparse convolution library and how to train
a BCAE-VS model on TPC data.
Installing MinkowskiEngine may not be super straightforward.
Please see the documentation here.
After installing the MinkowskiEngine following the instruction,
we should have a conda environment called py311-me.
Please activate the environment by running
conda activate py311-me
Get a clone of the NeuralCompression_v3 repository by running
git clone https://github.com/BNL-DAQ-LDRD/NeuralCompression_v3/tree/main
then
cd NeuralCompression_v3
Install the package by running
python setup.py develop
Again, please consider forking the repo and clone the fork. Let us make it better together.
The TPC data can be downloaded from Zenodo.
Please download both the occupancy_by_wedge.csv and outer.tgz.
The outer.tgz contains training and test data for the neural compression model.
And the occupancy_by_wedge.csv is needed for evaluation.
Decompress outer.tgz by running
tar -xvzf outer.tgz
It will produce a folder called outer.
Please move the occupancy_by_wedge.csv into the outer folder.
Later on, the root of the TPC data will be path_to_outer/outer.
Set the root to the data by running
export DATAROOT=/path/to/your/data
For example, for the TPC data, we can run
export DATAROOT=path_to_outer/outer
To train a BCAE-VS model on TPC data with the default config, cd into the train folder and run
python train.py config.yaml
Note: during training, if the
keep_ratio_softremains around .5, we may consider restart the training.
Assuming a pretrained model is saved at checkpoints/bi_lambda-30_lb-10/model_last.pth, to evaluate its performance on TPC data, we can run the following command:
python evaluate/evaluate.py checkpoints/bi_lambda-30_lb-10 --split test --device cuda --gpu-id 0 --precision full --compressed-path ./compresse --result-csv-path ./result.csv
If --compressed-path is not used, compressed data will not be saved.
If we just want to compress the data, we can run the following command:
python compress.py ../checkpoints/bi_lambda-30_lb-10/ --split test --device cuda --gpu-id 1 --compressed-path ./compresse