Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when uniting large methylrawlistDB object #269

Open
ethan-shealy opened this issue Oct 20, 2022 · 2 comments
Open

segfault when uniting large methylrawlistDB object #269

ethan-shealy opened this issue Oct 20, 2022 · 2 comments
Labels

Comments

@ethan-shealy
Copy link

Hi, I am having an issue where a my analysis is proceeding normally, then a crash happens which terminates the process. I am working with a large methylation dataset from a whole-genome bisulfite experiment, so have something like 10 million CpGs over 40 samples. The first "unite" function I tried worked fine,

methOBJ <- methRead(location = fileLOC.list,
                     sample.id = as.list(file.list),
                     assembly = "AnoSag2.1", 
                     context = "CpG",
                     mincov = 5,
                     treatment = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))  
Received list of locations.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
 									
 methOBJ <- filterByCoverage(methOBJ,lo.count = 5,lo.perc = NULL,
                             hi.count = NULL,hi.perc = 99.9)
 
 methOBJ <- normalizeCoverage(methOBJ)
 
 methFrame.total <- unite(methOBJ, destrand = TRUE, mc.cores = 1, min.per.group = 30L,
                          save.db = TRUE, suffix = "total_30L_", dbdir = getwd())
destranding...
uniting...
creating directory /scratch/es88065/anolis_methylseq/methylkit/methylDB_2022-10-20_f40
compressing the file with bgzip...
making tabix index...
flatfile located at: /scratch/es88065/anolis_methylseq/methylkit/methylDB_2022-10-20_f40/methylBase_total_30L_.txt.bgz
 
mat.methFrame.total <- percMethylation(methFrame.total)
 
 write.table(mat.methFrame.total, file="/scratch/es88065/anolis_methylseq/methylkit/total_mat_30L.data", quote = F, sep = "\t", row.names = F)
 
 methFrame.total.raw <- as(methFrame.total, "methylBase")
Uncompressing file.
Reading file.
 
 write.table(methFrame.total.raw, file="/scratch/es88065/anolis_methylseq/methylkit/total_meth_30L.data", quote = F, sep = "\t", row.names = F)
 
 rm("methFrame.total", "mat.methFrame.total", "methFrame.total.raw")

but since I want to utilize two different filtering steps I used the 'reorganize' function and tried to unite again. Upon doing this, I received this recursive error and the process crashed.

methOBJ.sex.age <- reorganize(methOBJ,
                           sample.ids=as.list(file.list),
                           treatment= c(4,4,4,10,10,4,11,5,11,5,11,5,9,3,3,3,9,9,1,7,1,7,8,2,2,2,2,8,8,8,8,12,12,6,12,6,6))
 
 methFrame.sex.age <- unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,
                            save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
destranding...

 *** caught segfault ***
address 0x2ba042876020, cause 'memory not mapped'
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation

Traceback:
 1: paste(cpgF$chr, cpgF$start, cpgF$end)
 2: .CpG.dinuc.unify(df)
 3: unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,     save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
 4: unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,     save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
An irrecoverable exception occurred. R is aborting now ...

I only utilized ~50% of my allocated memory, so I do not think this is the issue. I previously ran into a segfault error using methylkit's "as" command to extract methylbase data to a matrix. I had to use
methylation.matrix <- methylKit:::fread.gzipped(methFrame.total@dbpath, stringsAsFactors = FALSE, data.table = FALSE,skipDecompress=FALSE)
to resolve that issue and avoid the segfault error. Is this issue arising from something similar? Thanks for your help.

@alexg9010
Copy link
Collaborator

HI @ethan-shealy,

Thanks for reporting this issue. Unfortunately, these kinds of errors are rather hard to debug without the actual data at hand. Is this error actually reproducible and happens repeatedly? Would you be able to generate a minimal, reproducible example? Also, are you running the latest package versions of methylKit and data.table?

Related to the fix of your previous error, it seems that the use of data.table objects is causing these segfaults.
From what I read online, one reoccurring solution was to limit the number of threads using setDTthreads. Maybe you could test if this fixes the problem?

Best,
Alex

@ethan-shealy
Copy link
Author

Hi @alexg9010, this error does occur consistently on the unite step, but unfortunately I wasn't able to create a feasible reproducible example not involving a huge data set. I have been running these large computations on my university's computing cluster, and it seems the only solution was bringing it to my local machine and letting it chug along for a while. Because of this, I'm guessing that the issue is due to whatever version of R, data.table, and methylKit that our cluster is running. Just for the reference of anyone finding this thread, here are the software versions.
R version 4.2.1 methylKit version 1.22.0 data.table version 1.14.2 Running on CentOS Linux 7 kernel 3.10.0-1160.36.2.el7.x86_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants