-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download fails after several hours, 'processData' not found #3
Comments
Follow-up post: I just tried to download annotations for sample mgm4824992.3, and I received a similar error when downloading the RefSeq annotation file. Except this time it was due to a timeout error and not a 404 error
Are these issues related to the MG-RAST server? Every download I've tried so far has failed like this and I'm not sure how to address it |
Hi @zbendiks , Chordomics creates a # $HOME is wherever ~ evaluates to on Windows, usually something like C:\Users\<username>\
$HOME\chordomics\mgm4824985.3\ontology # download http://api.metagenomics.anl.gov/annotation/sequence/mgm4824985.3?evalue=10&type=ontology&source=COG
$HOME\chordomics\mgm4824985.3/organism # download http://api.metagenomics.anl.gov/annotation/sequence/mgm4824985.3?evalue=10&type=organism&source=RefSeq
$HOME\chordomics\mgm4824985.3/input_data # download http://api.metagenomics.anl.gov/annotation/sequence/mgm4824992.3?evalue=10&type=organism&source=RefSeq It looks like those first two files will already be there; if Thanks again for letting us know about the issued; let me know how this goes!
|
Hi @nickp60 , Thanks for the prompt response! I downloaded the COG, RefSeq, and FASTA files for sample 'mgm4824993.3' via the MG-RAST API with the following commands:
I then ran Chordomics in 'Automatic' mode with the MG-RAST ID#. Chordomics correctly recognized the folder and input files. It did spit out some warnings regarding 'single-line footers' but everything kept running so I figure it wasn't a big deal:
Chordomics then matched the taxids and now it is trying to merge the data
But it's been stuck here for ~ 4 hours now. I'll let it run overnight and see if it works |
Hi @nickp60 It's been ~18 hours since the data merging step began, but it doesn't seem like there's been any progress. Realistically, how long should I expect this step to take? |
Hmm, I will give it a go on my machine and try to see what the story is. |
... still downloading ... |
Hey, so sorry for the delay; I wasn't paying attention, and didn't realize this wasn't an assembled metagenome. Right now, Chordomics is really only geared to deal with assembled metagenomes rather than raw reads. The code just isn't build to handle raw reads. Your best be would probably be to assemble the reads yourself and upload them to MG-RAST as a companion project. I'll keep trying to process it here on my end, but I don't know how far I'll get. |
Hi, These are RNA-Seq metatranscriptome samples. Previously I assembled them with Trinity but ran into problems with chimeric sequences, and only a small percentage of reads were successfully annotated. An MG-RAST developer suggested that I skip the upstream assembly + abundance estimation and just submit my metatranscriptomes as short reads rather than assembled contigs, and this improved my data quality quite a bit. To my knowledge, MG-RAST doesn't provide any means to map short reads back to assembled contigs to estimate contig abundance. How is Chordomics able to describe changes in function across time, experimental group, etc. with just contig annotations but no abundance information? |
Hi @zbendiks , @KevinMcDonnell6, how hard would it be to add an option to display coverage as color, based on an additional line in the input data? I may be forgetting some of the details, but say we have a dataset with 100 taxa-function links, but 50 of those taxa link to "No COG". I believe currently we would end up with a big arc between for that link representing that this is 50% of the data. Say, however that all those 50 links had low values in a "coverage". column. Could we use either color alpha or an intensifying color scale to display that, rather than assigning colors just based on COG? I have attached a small dataset of 20, where I have added a column for "Coverage". The Thermococcaceae have much higher coverage than the others, despite only representing 15%. Could we (optionally) display this as color intensity? |
Thanks @nickp60 , that cleared things up for me. It's unfortunate that my data won't work with Chordomics as is, but I'll look into submitting contig assemblies to MG-RAST. It looks like my data is getting hung up around here (from MG-RAST_preprocess.R)
I haven't played around with your code directly, but R has a bunch of different aggregation methods (https://stackoverflow.com/questions/3685492/r-speeding-up-group-by-operations) and I'm wondering if we could speed up the merging step. |
Hi @zbendiks
|
Hi @zbendiks, I'll submit a pull request updating some of the documentation. Attached are the first 250k lines of the data. |
Thank you so much! I'll use the updated script to create a merged object for mgm4824993.3 and continue with the Chordomics pipeline. Assuming that all goes well, I'll continue with my remaining 13 samples. I'll update you guys once I do. |
Hi,
I'm having trouble downloading annotations from MG-RAST. Here's an example of the terminal output I got when trying to download MG-RAST sample ID 'mgm4824985.3'
after running for several hours
Chordomics is giving errors when it is trying to locate the MG-RAST FASTA files, and this has happened for several of my samples now. When the error occurs, the entire Chordomics browser goes grey and I can no longer interact with it. I'm not sure how to move forward with my analysis and I was hoping to get some advice
The text was updated successfully, but these errors were encountered: