ShaneGPUCountMinSketch.jl is a Julia package for fast, GPU-accelerated motif counting using the Count-Min Sketch probabilistic data structure. It is designed for large-scale sequence analysis, efficiently counting motif instances in sparse codes by leveraging CUDA-enabled GPUs.
We define a configuration
$c_K$ as a$(2K+1)$ -tuple for$K = 1,2,3,\ldots$ :
$$ c_K = \big( c[1],; c[2],; c[3],; \ldots,; c[2K+1] \big) $$
- The components
$c[2k+1]$ (for$1 \leq k \leq K$ ) represent filter indices.- The components
$c[2k]$ (for$1 \leq k \leq K-1$ ) represent the distances (e.g., number of nucleotides) between filters$c[2(k-1)+1]$ and$c[2k+1]$ .This structure allows flexible motif representation, encoding both filter indices and the distances between them.
- High-performance motif counting on GPU
- Probabilistic counting with bounded error using Count-Min Sketch
- Flexible configuration for motif structure and filter length
- Easy integration with Julia data structures
using Pkg
Pkg.add("ShaneGPUCountMinSketch")using ShaneGPUCountMinSketch
# Prepare your data as a Dict{Int, Vector{CartesianIndex{2}}}
nz_dict = Dict(
1 => [CartesianIndex(5, 2), CartesianIndex(10, 3)],
2 => [CartesianIndex(7, 1)]
)
num_fils = 2 # Number of filters in configuration
fil_len = 10 # Filter length
configs = obtain_enriched_configurations(nz_dict, num_fils, fil_len; min_count=2)obtain_enriched_configurations(
nz_dict::Dict{Int, Vector{CartesianIndex{2}}},
num_fils::Int,
fil_len::Int;
min_count=default_min_count
) -> Set{Tuple{Int, ...}}nz_dict: Dictionary mapping sequence indices to vectors of(position, filter index)pairs.num_fils: Number of filters in the configuration (motif size).fil_len: Length of each filter.min_count: Minimum count threshold for a configuration to be considered enriched.
Returns a set of integer tuples, each representing a unique motif configuration.
- This package uses a probabilistic data structure (Count-Min Sketch), so results may vary slightly between runs. In practice, the error is negligible and theoretically bounded.
MIT License. See LICENSE for details.