Releases: ChristopherWilks/megadepth
1.2.0
This release goes further than the last to substantially speed up BigWig coverage-over-an-annotation computations.
The speed up comes from collapsing annotated regions from the given BED file within N bases to form larger regions which are then used to query the index. This avoids querying the index for many small regions which are very close to each other. The N is controlled by the new option --distance, with a default of 2200 bases which appears from my tests to be more or less optimal, though this will vary depending on the BED file used and the density of coverage in the BigWig.
For this to work, the BED file passed to --annotation needs to be sorted according to sort -k1,1 -k2,2n. Megadepth will check to determine if this is the case. It will automatically fall back to using a slower version of the coverage calculation where the BigWig index is queried for each interval in the BED file independently, if it detects either or both of the following conditions:
- chromosomes are not contiguous (e.g.
chr1appears both before and afterchr2) - the previous check is OK but coordinates are not in order for a given chromosome
Also, for 1), the order of output will not be the order of input, rather regions from the same chromosome will be grouped together in output.
Finally --unsorted has been added to force Megadepth to fall back to the slower version when the argument to --annotation is unsorted or if the user wants to force querying of the BigWig index for each region.
1.1.3
This release attempts to speed up the BigWig calculations over an annotation, by using the BigWig index.
This mode was very substantially slowed down after the major bug fix in https://github.com/ChristopherWilks/megadepth/releases/tag/1.1.1
While this mode should be substantially faster than it was right after 1.1.1, it's still quite a bit slower than pre-1.1.1.
It also includes the experimental feature to get the true strand of the junction via the XS:A tag in BAM files that support this, as in pre-release 1.1.2a.
1.1.2a
This is a minor feature release which adds support for the XS:A tag that some RNA-seq aligners output in their BAM files.
If present in the BAM, the output of --junctions and --all-junctions will use the value of the XS:A tag (+ or -) to indicate the true strand, otherwise the mapping strand is used (as before).
The junctions/process_jx_output.sh will continue to just set the stand to 0 for the present.
1.1.1
This is a bug-fix only release.
If you are at all using the BigWig part of Megadepth we strongly suggest you switch to this release.
The BAM aspects of Megadepth are not affected by this release.
UPDATE 2021-12-06
Additional details about this release that were not included in the initial post:
An additional bug was fixed with this release that can have a much larger impact on the both the number of intervals Megadepth will scan in searching for overlaps to compute over when --annotation is used with a BigWig file and the performance of Megadepth. In the previous versions of Megadepth, it would stop searching for interval overlaps between a BigWig and an annotation for a particular chromosome too early if one or both of the following were present:
- an annotation had no overlaps
- the annotation's end coordinate was greater than the greatest (last) coordinate in the BigWig's regions for that chromosome
Depending on the BigWig file and set of annotation intervals used, Megadepth may miss most/all of the interval overlaps and severely under compute the desired calculation. This also means the performance of Megadepth will be substantially slower in many cases than what's reported in the paper due to this bug.
This release also fixes a bug first reported in #9 by @osakarya.
The bug is that Megadepth would (until now) miss a number of overlapping scenarios when operating over a BigWig for a set of annotation intervals/regions (all operation types).
Two primary examples where Megadepth would miss overlaps are (not exhaustive):
- when the annotated region's start coordinate did not overlap a region in the BigWig but the end coordinate did
- when the annotated region strictly contained the BigWig region
1.1.0c
very minor bug fix release to --alts to fully support cigar ops H (hardclipping) and P (padding)
1.1.0b
- adds --add-chr-prefix to convert chromosome short names to use
chrprefix for compatibility with recount3 - better handling of when the argument for
--annotationisn't present, now prints error rather than segfaulting(!) - better handling of some runtime errors
1.1.0
- adds
--all-junctionsto output every potential junction found in the BAM file - adds post-processing script
junctions/process_jx_output.shfor creating STAR-like formatted junction output
This is in addition to the --junctions option which continues to exist separately for outputting only those junctions which co-occur in a read or read pair. However, this option's output has also been updated to report whether the alignment used was unique or not. Please see the README for more details.
1.0.9b
Extends --alts option to print out the read ID/names when --double-count is not enabled and there is known overlaps between two mates in a a read pair. This informs downstream tools which try to call genotypes or do ASE to which calls are duplicates or are conflicting.
1.0.9
- adds
--annotation <window_in_bp>option to specify a contiguous set of regions to calculate coverage across - now uses the increment/decrement strategy of Mosdepth for whole genome and windowed coverage
- added
--no-indexto use with certain dense region queries where performance may be better not jumping around the BAM index - when
--gzipis used, generate Tabix index inline with the coverage output rather than at the end (similar to Mosdepth), this is faster - adding SIMD and memset code from @dnbaker for count performance
- fixed major performance bug in resetting chromosome coverage array
1.0.8b
-
adds
--fastato support local genome reference FASTA file for CRAM input files (to avoid downloading the references) -
skip decoding of unused CRAM fields (MD tags, qualities, sequence, tlen, auxiliary and readgroups) for coverage and region performance unless
--altsis passed in, then all those fields get decoded -
now uses the
countlutu32toa method from https://github.com/miloyip/itoa-benchmark to write out base level coverage faster -
fixed regression buffer overflow bug introduced in 1.0.7 in the BAMIterator code when running with a set of regions (
--annotation)