Sample.idError with seqVCF2GDS

Hello, I am trying to use seqVCF2GDS and am getting the following error: 

library(SeqArray)
library(data.table)

seqVCF2GDS(high_mod_vcf, "r4_chr1_high_mod.gds", parallel=6L)
Mon Nov  6 16:09:06 2023
Variant Call Format (VCF) Import:
    file(s):
        r4_PASS_chr1_updated_varID_dups_drop_updated_IDs_nhw_hwe6_noNHWrelateds_high_mod_impact.vcf (198.8M)
    file format: VCFv4.2
    the number of sets of chromosomes (ploidy): 2
    the number of samples: 14,306
    genotype storage: bit2
    compression method: LZMA_RA
    # of samples: 14306
    calculating the total number of variants ...
    the total number of variants for import: 3,632
    Writing to 6 files:
        r4_chr1_high_mod_tmp01_ad336f56fc72 [1..606]
        r4_chr1_high_mod_tmp02_ad3315e862b7 [607..1,212]
        r4_chr1_high_mod_tmp03_ad33613818b1 [1,213..1,818]
        r4_chr1_high_mod_tmp04_ad33473817c6 [1,819..2,424]
        r4_chr1_high_mod_tmp05_ad334e0fea8c [2,425..3,030]
        r4_chr1_high_mod_tmp06_ad33607634f8 [3,031..3,632]
    Done (Mon Nov  6 16:09:10 2023).
Output:
    r4_chr1_high_mod.gds
Merging:
    opening 'r4_chr1_high_mod_tmp01_ad336f56fc72' ... [done]
    opening 'r4_chr1_high_mod_tmp02_ad3315e862b7' ... [done]
    opening 'r4_chr1_high_mod_tmp03_ad33613818b1' ... [done]
    opening 'r4_chr1_high_mod_tmp04_ad33473817c6' ... [done]
    opening 'r4_chr1_high_mod_tmp05_ad334e0fea8c' ... [done]
    opening 'r4_chr1_high_mod_tmp06_ad33607634f8' ... [done]
Digests:
    sample.idError: segfault from C stack overflow

Do the sampel IDs need to be in a particular format? I created my vcf with plink and used double-id option. IDs are in format: A-[Cohort]-[A#####]. A .gds file is outputed, but I don't know if it's is incorrect due to the segfault.

gds <- seqOpen(r4_chr1_high_mod.gds)
gds
Object of class "SeqVarGDSClass"
File: r4_chr1_high_mod.gds (294.4K)
+    [  ] *
|--+ description   [  ] *
|--+ sample.id   { Str8 14306 LZMA_ra(2.94%), 12.6K }
|--+ variant.id   { Int32 3632 LZMA_ra(12.7%), 1.8K }
|--+ position   { Int32 3632 LZMA_ra(62.3%), 8.8K }
|--+ chromosome   { Str8 3632 LZMA_ra(1.62%), 125B }
|--+ allele   { Str8 3632 LZMA_ra(24.4%), 4.0K }
|--+ genotype   [  ] *
|  |--+ data   { Bit2 2x14306x3632 LZMA_ra(0.95%), 242.2K }
|  |--+ extra.index   { Int32 3x0 LZMA_ra, 18B } *
|  \--+ extra   { Int16 0 LZMA_ra, 18B }
|--+ phase   [  ]
|  |--+ data   { Bit1 14306x3632 LZMA_ra(0.02%), 1.3K }
|  |--+ extra.index   { Int32 3x0 LZMA_ra, 18B } *
|  \--+ extra   { Bit1 0 LZMA_ra, 18B }
|--+ annotation   [  ]
|  |--+ id   { Str8 3632 LZMA_ra(28.1%), 16.0K }
|  |--+ qual   { Float32 3632 LZMA_ra(0.92%), 141B }
|  |--+ filter   { Int32 3632 LZMA_ra(0.92%), 141B }
|  |--+ info   [  ]
|  |  \--+ PR   { Bit1 3632 LZMA_ra(18.9%), 93B } *
|  \--+ format   [  ]
\--+ sample.annotation   [  ]


sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /cvmfs/priv.accre.vanderbilt.edu/mirror/optimized/sandy_bridge/easybuild/software/MPI/intel/2019.1.144/impi/2018.4.274/R/3.6.0/lib64/R/lib/libR.so
LAPACK: /cvmfs/priv.accre.vanderbilt.edu/mirror/optimized/sandy_bridge/easybuild/software/MPI/intel/2019.1.144/impi/2018.4.274/R/3.6.0/lib64/R/modules/lapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8 SeqArray_1.26.2   gdsfmt_1.22.0    

loaded via a namespace (and not attached):
 [1] zlibbioc_1.32.0        compiler_3.6.0         IRanges_2.20.2        
 [4] XVector_0.26.0         parallel_3.6.0         GenomicRanges_1.38.0  
 [7] GenomeInfoDbData_1.2.2 RCurl_1.95-4.12        Biostrings_2.54.0     
[10] S4Vectors_0.24.4       BiocGenerics_0.32.0    GenomeInfoDb_1.22.1   
[13] bitops_1.0-6           stats4_3.6.0          


Thank you,
Alexis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample.idError with seqVCF2GDS #87

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sample.idError with seqVCF2GDS #87

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions