Skip to content

Latest commit

 

History

History
142 lines (85 loc) · 4.37 KB

step1.rst

File metadata and controls

142 lines (85 loc) · 4.37 KB

|CyVerse_logo|_

|Home_Icon|_ Learning Center Home

Sample dataset and preprocessing

In this tutorial, we are analyzing 1M reads from Arabidopsis thaliana leaf RNA-seq dataset (SRR7947123) from Zhao et al., 2018. Example data is available from CyVerse datastore.


Input Data:

Input Description Example
Leaf RNA-seq data 1M reads dataset from SRR7947123 iplantcollaborative > example_data > HAMR_tutorial -> fastqfiles

Preprocessing

Evaluate the quality of your sequencing data using FastQC

Preprocessing will assess the quality of the raw reads to identify possible sequencing errors or biases. FastQC can be used for an overview of the data quality.

  1. Login to the Discovery Environment.
  2. Click on "Apps" tab in the Discovery Environment and search for "fastqc".
  3. Click on the app icon.
  4. Change the name of the analysis and output folder as needed or leave for defaults.
  5. Under "Input" click on Add to provide input files. Sample dataset location iplantcollaborative > example_data > HAMR_tutorial -> fastqfiles. Check both files and click 'OK'.
  6. For next section "Resource Requirements" request resources as needed or leave for defaults
  7. Click Launch Analysis. You will receive a notification that the job has been submitted and running. Click on the Analyses tab to check the status of your job. When the analysis completes, click on the right three dots menu and click on 'Go to output folder' to access you output files.

Output/Results

Output Description Example
html and zip files FastqQC report SRR7947123_1M_fastqc.html

Description of output and results

Click on the html report files and check if your sequencing data has any red flags that you should be aware of. For more details on each module of the fastqc report, check FastQC documentation


Fix or improve this documentation


|Home_Icon|_ Learning Center Home