Skip to content

Repository for code and small datasets derived from the TERRA REF program

License

Notifications You must be signed in to change notification settings

genophenoenvo/terraref-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TERRA-REF and related data for GenoPhenoEnvo research project

Data Use, Licenses, and Preferred Citations

  • TERRA-REF data from Season 4 and 6 contained in this repository are licensed under CC-0. Please cite LeBauer et al 2020 when using these data.
  • KSU data is unpublished and should not be reused without permission from Geoff Morris.
  • Clemson data are from Brenton et al (2016)
  • This repository are licensed under MIT (see file LICENSE).

LeBauer, David et al. (2020), Data From: TERRA-REF, An open reference data set from high resolution genomics, phenomics, and imaging sensors, v6, Dryad, Dataset, https://doi.org/10.5061/dryad.4b8gtht99

Brenton, Zachary W., et al. "A genomic resource for the development, improvement, and exploitation of sorghum for bioenergy." Genetics 204.1 (2016): 21-33. https://doi.org/10.1534/genetics.115.183947

Corresponding Author

David LeBauer, University of Arizona, dlebauer@arizona.edu

Summary

This repository contains source code for accessing and curating TERRA-REF data used to support the GenoPhenoEnvo project and machine learning research. This repository supports one of our goals to provide open data and reproducible code in order to follow FAIR data principles and contribute to open science.

This repository focuses on Sorghum bicolor trait data collected from four experiments and the associated weather data for those locations, listed below.

  • Maricopa Agricultural Center, University of Arizona, Season 4
    • Coordinates: 33.069, -111.972
    • Elevation: 362 meters
    • Planting: 2017-04-20, Day 110
    • Last Day of Harvest: 2017-09-16, Day 259
  • Maricopa Agricultural Center, University of Arizona, Season 6
    • Coordinates: 33.068941, -111.972244
    • Elevation: 362 meters
    • Planting: 2018-04-25, Day 115
    • Harvest: 2018-08-01, Day 213
  • Kansas State University, Ashland Bottoms
    • Coordinates: 39.126, -96.677
    • Elevation: 325 meters
    • Planting: 2016-06-17, Day 169
    • Harvest: 2016-10-21, Day 295
  • Clemson University Pee Dee Research and Education Center, South Carolina
    • Coordinates: 34.289, -79.737
    • Elevation: 42 meters
    • Planting: 2014-05-06, Day 126
    • Latest date in Clemson trait data: 2014-10-15, Day 288

Trait Data

Trait data prepared for this analysis can be downloaded in .csv format from CyVerse.

MAC Season 4

MAC Season 6

KSU

Clemson

Content of trait data CSV files

These tables have the following structure ...

Data Processing

The following traits and units were selected from the raw data for analysis, with the sites that collected those phenotypes. The calculation used for growing degree days (gdd) can be found here.

  • days_to_flowering: days, gdd
    • MAC Season 4, KSU, Clemson
  • days_to_flag_leaf_emergence: days, gdd
    • MAC Season 4
  • canopy_height: cm
    • MAC Season 4, MAC Season 6, KSU, Clemson
  • aboveground_dry_biomass: kg/ha
    • MAC Season 4, MAC Season 6, Clemson

The following Jupyter notebooks contain code to process the raw trait data

Weather Data

Contains the weather data during season dates for sorghum experiments at these locations

Parameters and Units

  • Date: YYYY-MM-DD format
  • Day of year
  • Minimum temperature: Celsius
  • Maximum temperature: Celsius
  • Mean temperature: Celsius
  • Accumulated growing degree days (gdd): heat units
    • 10 degrees Celsius is base temperature for sorghum
    • Daily gdd value = ((max temp + min temp) / 2) - 10 (base temperature)
    • Accumulated growing degree days = cumulative sum of daily gdd values
  • Minimum relative humidity: percentage
  • Maximum relative humidity: percentage
  • Mean relative humidity: percentage
  • Vapor pressure deficit: Kilopascals
    es = (6.11 * np.exp((2500000/461) * (1/273 - 1/(273 + temp_avg))))
    vpd = (((100 - rh_avg)/1000) * es)
    
  • Precipitation: millimeters
  • Cumulative precipitation: millimeters
  • First water deficit treatment: boolean value
    • True values only found in MAC Season 4
  • Second water deficit treatment: boolean value
    • True values only found in MAC Season 4

Information about MAC season 4 water deficit treatments can be found here

Weather data sources

Data Processing

  • Weather data that could not be found in all four seasons were dropped during processing, but can be accessed in the raw data
  • The Python3 code used to process weather data can be found in src/weather_data_cleaning.py. This script will produce the following output data:
    • mac_season_4_weather.csv
    • mac_season_6_weather.csv
    • ksu_weather.csv
    • clemson_weather.csv

Folder and file structure


Project Organization

Data on CyVerse

Code in Repository

TODO: what does this refer to?

├── LICENSE
|
├── README.md          <- The top-level README for developers using this project.
|
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── notebooks          <- Jupyter notebooks.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
|    
├── scripts             <- Source code for use in this project.

About

Repository for code and small datasets derived from the TERRA REF program

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •