-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi,
Thanks so much for making this package available, this is a brilliant resource, especially for neoantigen prediction in mice. We are trying to call neoantigens in a tumor derived from a BALB/c background, and this creates certain issues around reference sequences. I note that you recommend aligning to the BALB/c-specific reference genome from the Sanger, I believe that has different coordinates to the Mouse Genome Project SNP file that you recommend (ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/BALB_cJ.mgp.v5.snps.dbSNP142.vcf.gz), as these files are correspond to BALBc-specific mutations when reads are aligned to GRCm38 genome, and thus are incompatible (unless I am mistaken). As a result these files are not compatible. To complicate matters further, in our experience that BALB/c reference has significant gaps even in coding regions and indeed the Sanger paper where these strain-specific assemblies was published alludes to a substantially higher error rate versus GRCm38 (https://www.nature.com/articles/s41588-018-0223-8#Sec2). We have come to the conclusion that we should use the GRCm38 to align our BALB/c reads, especially as GRCm38 (cf. GRCm39) includes patches that correspond to strain-specific haplotypes. We use the pan-strain SNP and indels from the Sanger Mouse Genome Project for base quality score recalibration and then call mutations using Strelka2.
I was wondering if you had any advice about neoantigen calling for BALB/c data as we are planning. My feeling is that the best universal approach is to align everything to GRCm38 and then use the cDNA and peptides derived from this reference (i.e. available here http://ftp.ensembl.org/pub/release-89/fasta/mus_musculus/), as this is designed to capture majority of variation across most strains. Would really appreciate your thoughts on the question.
Kind regards,
Dr Sam Kleeman MD
PhD Student
Cold Spring Harbor Laboratory, NY