Releases · ChristopherWilks/megadepth

02 Feb 15:35

1.2.0

2b3048a

1.2.0 Latest

Latest

This release goes further than the last to substantially speed up BigWig coverage-over-an-annotation computations.

The speed up comes from collapsing annotated regions from the given BED file within N bases to form larger regions which are then used to query the index. This avoids querying the index for many small regions which are very close to each other. The N is controlled by the new option --distance, with a default of 2200 bases which appears from my tests to be more or less optimal, though this will vary depending on the BED file used and the density of coverage in the BigWig.

For this to work, the BED file passed to --annotation needs to be sorted according to sort -k1,1 -k2,2n. Megadepth will check to determine if this is the case. It will automatically fall back to using a slower version of the coverage calculation where the BigWig index is queried for each interval in the BED file independently, if it detects either or both of the following conditions:

chromosomes are not contiguous (e.g. chr1 appears both before and after chr2)
the previous check is OK but coordinates are not in order for a given chromosome

Also, for 1), the order of output will not be the order of input, rather regions from the same chromosome will be grouped together in output.

Finally --unsorted has been added to force Megadepth to fall back to the slower version when the argument to --annotation is unsorted or if the user wants to force querying of the BigWig index for each region.

Assets 6

29 Dec 20:25

ChristopherWilks

1.1.3

89641a1

1.1.3

This release attempts to speed up the BigWig calculations over an annotation, by using the BigWig index.
This mode was very substantially slowed down after the major bug fix in https://github.com/ChristopherWilks/megadepth/releases/tag/1.1.1

While this mode should be substantially faster than it was right after 1.1.1, it's still quite a bit slower than pre-1.1.1.

It also includes the experimental feature to get the true strand of the junction via the XS:A tag in BAM files that support this, as in pre-release 1.1.2a.

Assets 5

13 Oct 01:14

ChristopherWilks

1.1.2a

dd5b91c

1.1.2a Pre-release

Pre-release

This is a minor feature release which adds support for the XS:A tag that some RNA-seq aligners output in their BAM files.

If present in the BAM, the output of --junctions and --all-junctions will use the value of the XS:A tag (+ or -) to indicate the true strand, otherwise the mapping strand is used (as before).

The junctions/process_jx_output.sh will continue to just set the stand to 0 for the present.

Assets 5

24 Aug 19:33

ChristopherWilks

1.1.1

65fc71b

1.1.1

This is a bug-fix only release.
If you are at all using the BigWig part of Megadepth we strongly suggest you switch to this release.
The BAM aspects of Megadepth are not affected by this release.

UPDATE 2021-12-06
Additional details about this release that were not included in the initial post:
An additional bug was fixed with this release that can have a much larger impact on the both the number of intervals Megadepth will scan in searching for overlaps to compute over when --annotation is used with a BigWig file and the performance of Megadepth. In the previous versions of Megadepth, it would stop searching for interval overlaps between a BigWig and an annotation for a particular chromosome too early if one or both of the following were present:

an annotation had no overlaps
the annotation's end coordinate was greater than the greatest (last) coordinate in the BigWig's regions for that chromosome

Depending on the BigWig file and set of annotation intervals used, Megadepth may miss most/all of the interval overlaps and severely under compute the desired calculation. This also means the performance of Megadepth will be substantially slower in many cases than what's reported in the paper due to this bug.

This release also fixes a bug first reported in #9 by @osakarya.
The bug is that Megadepth would (until now) miss a number of overlapping scenarios when operating over a BigWig for a set of annotation intervals/regions (all operation types).

Two primary examples where Megadepth would miss overlaps are (not exhaustive):

when the annotated region's start coordinate did not overlap a region in the BigWig but the end coordinate did
when the annotated region strictly contained the BigWig region

Contributors

osakarya

Assets 5

07 Apr 04:05

ChristopherWilks

1.1.0c

631c885

1.1.0c

very minor bug fix release to --alts to fully support cigar ops H (hardclipping) and P (padding)

Assets 5

22 Jan 15:38

ChristopherWilks

1.1.0b

61d4d6c

1.1.0b

adds --add-chr-prefix to convert chromosome short names to use chr prefix for compatibility with recount3
better handling of when the argument for --annotation isn't present, now prints error rather than segfaulting(!)
better handling of some runtime errors

Assets 5

19 Jan 16:07

ChristopherWilks

1.1.0

7c92824

1.1.0

adds --all-junctions to output every potential junction found in the BAM file
adds post-processing script junctions/process_jx_output.sh for creating STAR-like formatted junction output

This is in addition to the --junctions option which continues to exist separately for outputting only those junctions which co-occur in a read or read pair. However, this option's output has also been updated to report whether the alignment used was unique or not. Please see the README for more details.

Assets 5

17 Dec 09:16

ChristopherWilks

1.0.9b

6007f28

1.0.9b

Extends --alts option to print out the read ID/names when --double-count is not enabled and there is known overlaps between two mates in a a read pair. This informs downstream tools which try to call genotypes or do ASE to which calls are duplicates or are conflicting.

Assets 5

10 Nov 22:00

ChristopherWilks

1.0.9

4b33178

1.0.9

adds --annotation <window_in_bp> option to specify a contiguous set of regions to calculate coverage across
now uses the increment/decrement strategy of Mosdepth for whole genome and windowed coverage
added --no-index to use with certain dense region queries where performance may be better not jumping around the BAM index
when --gzip is used, generate Tabix index inline with the coverage output rather than at the end (similar to Mosdepth), this is faster
adding SIMD and memset code from @dnbaker for count performance
fixed major performance bug in resetting chromosome coverage array

Assets 5

30 Oct 14:17

ChristopherWilks

1.0.8b

e56504c

1.0.8b

adds --fasta to support local genome reference FASTA file for CRAM input files (to avoid downloading the references)
skip decoding of unused CRAM fields (MD tags, qualities, sequence, tlen, auxiliary and readgroups) for coverage and region performance unless --alts is passed in, then all those fields get decoded
now uses the countlut u32toa method from https://github.com/miloyip/itoa-benchmark to write out base level coverage faster
fixed regression buffer overflow bug introduced in 1.0.7 in the BAMIterator code when running with a set of regions (--annotation)

Assets 5

Releases: ChristopherWilks/megadepth

1.2.0

Uh oh!

1.1.3

Uh oh!

1.1.2a

Uh oh!

1.1.1

Contributors

Uh oh!

1.1.0c

Uh oh!

1.1.0b

Uh oh!

1.1.0

Uh oh!

1.0.9b

Uh oh!

1.0.9

Uh oh!

1.0.8b

Uh oh!