Skip to content

Writing Single Profile NetCDF Files

Russell Senior edited this page Jun 4, 2015 · 30 revisions

WikiCookbookWriting Single Profile NetCDF Files

In this post, we'll walk through the process of writing single profile NetCDF files suitable for submission to the IOOS National Glider Data Assembly Center. Detailed information on the IOOS NGDAC and the NetCDF file format can be found here.

The following is a detailed discussion of the entire process. This process has been wrapped up into a single function:

DbdGroup2IoosNc.m

which provides a convenient and relatively painless way to write NetCDF files for profiles contained in a DbdGroup instance. However, I recommend a careful reading of the following for understanding of what's going on under the hood. With that said, documentation on the scripts usage can be found here.

#Contents#

##Loading Glider Data Sets##

We'll be using an instance of the DbdGroup class and the instance's toProfiles() method to create a structured array in which each element of the array represents an individual profile from the data set.

Let's load the DbdGroup and get started:

>> load ./deployments-test/2013/ru29-401/ru29-401_DbdGroup_sci-qc0.mat
>> dgroup

dgroup = 

  DbdGroup with properties:

            segments: {695x1 cell}
         sourceFiles: {695x1 cell}
               bytes: [695x1 double]
                rows: [695x1 double]
    timestampSensors: {695x1 cell}
        depthSensors: {695x1 cell}
          startTimes: {695x1 cell}
            endTimes: {695x1 cell}
       startDatenums: [695x1 double]
         endDatenums: [695x1 double]
             sensors: {45x1 cell}
         sensorUnits: [1x1 struct]
         numProfiles: [695x1 double]
                dbds: [1x695 Dbd]
      hasBoundingGps: [695x1 logical]
         newSegments: {}
             scratch: [1x1 struct]

Once we've got the data set loaded, we need to export it to a structured array using the DbdGroup.toProfiles method:

>> pStruct = dgroup.toProfiles

pStruct = 

1x474 struct array with fields:

    meta
    timestamp
    depth
    drv_distance_along_track
    drv_latitude
    drv_longitude
    drv_m_gps_lat
    drv_m_gps_lon
    drv_m_present_time_datenum
    drv_m_pressure
    drv_proDir
    drv_proInds
    drv_sci_m_present_time_datenum
    drv_sci_water_pressure
    drv_sea_water_density
    drv_sea_water_electrical_conductivity
    drv_sea_water_potential_temperature
    drv_sea_water_salinity
    drv_sea_water_temperature
    drv_speed_of_sound_in_seawater
    m_altitude
    m_avg_speed
    m_battery_inst
    m_coulomb_amphr
    m_de_oil_vol
    m_final_water_vx
    m_final_water_vy
    m_gps_lat
    m_gps_lon
    m_hdg_derror
    m_hdg_ierror
    m_lat
    m_lon
    m_pitch
    m_present_time
    m_pressure
    m_roll
    m_science_on
    m_tot_num_inflections
    m_vacuum
    m_water_depth
    m_water_vx
    m_water_vy
    sci_m_disk_free
    sci_m_present_time
    sci_water_cond
    sci_water_pressure
    sci_water_temp

The return value is a structured array in which each element of the array is a single profile from the deployment. Each profile contains a metadata field:

>> pStruct(1).meta

ans = 

             glider: 'ru29'
            segment: 'ru29_2013_313_2_0'
         sourceFile: '/home/kerfoot/sandbox/glider/deployments-test/2013/ru29-401/ascii/queue/ru29_2013_313_2_0_sbd.dat'
     the8x3filename: '01260000'
    timestampSensor: 'drv_sci_m_present_time_datenum'
        depthSensor: 'drv_sci_water_pressure'
       startDatenum: 735548.595819052
         endDatenum: 735548.59918617
          startTime: '2013-11-10 14:17:58'
            endTime: '2013-11-10 14:22:49'
           minDepth: 9
           maxDepth: 69.75
             lonLat: [-14.4442897142857 -7.86986685714286]
          direction: 'd'

The remaining fields are 1-D arrays containing all of the available sensor data. These fields will vary depending on the glider's science payload configuration. Two fields (timestamp and depth) are always present and are aliases for the sensors specified in each Dbd instance of the DbdGroup.dbds array via the Dbd.timestampSensor and Dbd.depthSensor, respectively.

This data structure will be used to map native glider or DbdGroup sensor names to their NetCDF file equivalents.

##Mapping Glider Sensors to NetCDF Variables##

We're going to create a very simple NetCDF File to represent a single profile from the glider dataset. The CDL description of this format is here.

We can use mapIoosGliderFlatNcSensors.m to create a data structure in which each elements represents a profile and contains native glider sensors we want to include mapped to the appropriate NetCDF file variables. Internally, this routine calls getIoosGliderFlatNcSensorMappings, which returns a structured array mapping the NetCDF variables in our schema to the sensors available in the pStruct structured array we created in the previous step. This routine contains the default list of sensors that may be selected for mapping to each NetCDF variable and may be modified in the event that new sensors resulting from some level of QA/QC, for example, have been added to the DbdGroup. Here's what the return value looks like:

>> sensorMap = getIoosGliderFlatNcSensorMappings

sensorMap = 

            time: {3x1 cell}
             lat: {2x1 cell}
             lon: {2x1 cell}
        pressure: {3x1 cell}
           depth: {2x1 cell}
     temperature: {3x1 cell}
    conductivity: {3x1 cell}
        salinity: {'drv_sea_water_salinity'}
         density: {'drv_sea_water_density'}
               u: {3x1 cell}
               v: {3x1 cell}
         time_uv: {}
          lat_uv: {}
          lon_uv: {}
      profile_id: {}
    profile_time: {}
     profile_lat: {}
     profile_lon: {}
      trajectory: {}  

Let's take a closer look at the time field and how mapIoosGliderFlatNcSensors.m uses the list of available sensors to select the appropriate data array. getIoosGliderFlatNcSensorMappings is called internally by mapIoosGliderFlatNcSensors.m, but it's worth looking at how the sensor mapping is done:

>> sensorMap.time

ans = 

    'timestamp'
    'drv_sci_m_present_time_datenum'
    'drv_m_present_time_datenum'

The order of the sensors contained in the cell array is used to select the first valid time sensor from pStruct. In this case timestamp is contained in the first profile in pStruct (pStruct(1)):

>> pStruct(1)

ans = 

                                     meta: [1x1 struct]
                                timestamp: [35x1 double] << SENSOR FOUND
                                    depth: [35x1 double]
                 drv_distance_along_track: [35x1 double]
                             drv_latitude: [35x1 double]
                            drv_longitude: [35x1 double]
                            drv_m_gps_lat: [35x1 double]
                            drv_m_gps_lon: [35x1 double]
               drv_m_present_time_datenum: [35x1 double]
                           drv_m_pressure: [35x1 double]
                               drv_proDir: [35x1 double]
                              drv_proInds: [35x1 double]
           drv_sci_m_present_time_datenum: [35x1 double]
                   drv_sci_water_pressure: [35x1 double]
                    drv_sea_water_density: [35x1 double]
    drv_sea_water_electrical_conductivity: [35x1 double]
      drv_sea_water_potential_temperature: [35x1 double]
                   drv_sea_water_salinity: [35x1 double]
                drv_sea_water_temperature: [35x1 double]
           drv_speed_of_sound_in_seawater: [35x1 double]
                               m_altitude: [35x1 double]
                              m_avg_speed: [35x1 double]
                           m_battery_inst: [35x1 double]
                          m_coulomb_amphr: [35x1 double]
                             m_de_oil_vol: [35x1 double]
                         m_final_water_vx: [35x1 double]
                         m_final_water_vy: [35x1 double]
                                m_gps_lat: [35x1 double]
                                m_gps_lon: [35x1 double]
                             m_hdg_derror: [35x1 double]
                             m_hdg_ierror: [35x1 double]
                                    m_lat: [35x1 double]
                                    m_lon: [35x1 double]
                                  m_pitch: [35x1 double]
                           m_present_time: [35x1 double]
                               m_pressure: [35x1 double]
                                   m_roll: [35x1 double]
                             m_science_on: [35x1 double]
                    m_tot_num_inflections: [35x1 double]
                                 m_vacuum: [35x1 double]
                            m_water_depth: [35x1 double]
                               m_water_vx: [35x1 double]
                               m_water_vy: [35x1 double]
                          sci_m_disk_free: [35x1 double]
                       sci_m_present_time: [35x1 double]
                           sci_water_cond: [35x1 double]
                       sci_water_pressure: [35x1 double]
                           sci_water_temp: [35x1 double]

So the corresponding data array is used for the NetCDF time variable. If timestamp was not found in the profile element, the function next looks for drv_sci_m_present_time_datenum and uses that data array, if found. If not, the function finally looks for drv_m_present_time_datenum.

Here's an example of the use of mapIoosGliderFlatNcSensors.m:

>> trajectoryTs = datenum(2013,11,10,14,0,0)

trajectoryTs =

          735548.583333333

>> ncStruct = mapIoosGliderFlatNcSensors(pStruct, trajectoryTs)

ncStruct = 

1x527 struct array with fields:

    profile_id
    meta
    vars

There are 2 input arguments to mapIoosGliderFlatNcSensors: (1) the profiles structure array we created using the DbdGroup.toProfiles method and (2) a Matlab datenum number specifying the deployment start date. The return value, ncStruct, is a structured array containing 3 fields:

  • profile_id: Sequential profile number corresponding to the element index. This field must contain a numeric scalar in order for a NetCDF file to be written. It's also a variable in the NetCDF file (ie: contained in the ncStruct.vars structured array, and the value stored in ncStruct.vars is written to the NetCDF file.
  • meta: a structured array containing metadata about the profile
  • vars: a structured array in which each element describe a NetCDF variable, it's corresponding native glider sensor and the 1-D data array

Here's what the structure of the meta and vars fields look like, respectively:

>> ncStruct(1).meta

ans = 

             glider: 'ru29'
            segment: 'ru29_2013_313_2_0'
         sourceFile: '/home/kerfoot/sandbox/glider/deployments-test/2013/ru29-401/ascii/queue/ru29_2013_313_2_0_sbd.dat'
     the8x3filename: '01260000'
    timestampSensor: 'drv_sci_m_present_time_datenum'
        depthSensor: 'drv_sci_water_pressure'
       startDatenum: 735548.595819052
         endDatenum: 735548.59918617
          startTime: '2013-11-10 14:17:58'
            endTime: '2013-11-10 14:22:49'
           minDepth: 9
           maxDepth: 69.75
             lonLat: [-14.4442897142857 -7.86986685714286]
          direction: 'd'

>> ncStruct(1).vars

ans = 

1x19 struct array with fields:

    ncVarName
    sensor
    data

Now that we've got the glider sensor data mapped to the NetCDF file variable schema, it's time to write some NetCDF files.

##Writing Profile NetCDF Files##

The NetCDF file format we're using stores all sensor data records from a single profile into one NetCDF file. This file specification is used by the IOOS National Glider Data Assembly Center to collect and aggregate individual profiles from a glider deployment into a single data set containing all data from the specified deployment. Once aggregated, access to the deployment data set is relatively simple and can be searched a retrieved by time, geographic location, depth and sensor name. The aggregation is done using NOAA's Environmental Research Division's Data Access Program (ERDDAP).

Before we get to NetCDF file creation, I want to mention an important point regarding this file specification. As mentioned before, the file format specification represents a single profile contained in the deployment dataset. In fact, the specification contains a profile_id variable that is a numeric scalar representing the profile number. As such, this number must be unique to the data set. If it's not unique, the ERDDAP aggregation will fail. For example, if you have 10 profiles that you want to write NetCDF files for, the ncStruct.vars element representing the profile variable must contain a value that is unique, ie: 1,2,3,4,5,6,7,8,9 or 10.

This toolbox provides a function, writeIoosGliderFlatNc.m, that takes a single element from the data structure created using mapIoosGliderFlatNcSensors.m and writes the corresponding NetCDF file. Let's first take a look at the doco for this routine:

>> help writeIoosGliderFlatNc
  outFile = writeIoosGliderFlatNc(pStruct[,varargin])
 
  Accepts a single profile contained in pStruct, returned from 
  mapIoosGliderFlatNcSensors.m, and writes a NetCDF file conforming to
  the IOOS National Glider Data Assembly Standard Specification, version 2.
 
  Options:
  'clobber', [true or false]: by default, existing NetCDF files are not 
    overwritten.  Set to true to overwrite existing files.
  'ncschema', STRUCT: structured array mapping global NetCDF file
    attributes to values.  If not specified, default values are taken from 
    the NetCDF template file.
  'outfile', STRING: the NetCDF filename is constructed from the .meta
    field.  Use this option to specify a custom filename.
  'outdirectory', STRING: NetCDF files are written to the current working
    directory.  Use this option to specify an alternate path.
  'mode', STRING: the state of the dataset, which can be either 'rt' for
    real-time (i.e.: sbd/tbd  files) or 'delayed' for a recovered dataset (i.e.:
    dbd/ebd files).  Default is rt.
 
  See also mapIoosGliderFlatNcSensors getIoosGliderFlatNcSensorMappings loadNcJsonSchema

By default, the routine creates a new NetCDF file in the current working directory assuming it doesn't already exist. There are a few options that can be used to modify this behavior and the resulting NetCDF file. Three of them, clobber, outfile and outdirectory are self-explanatory. The second option, ncschema allows us to specify a structured array defining the global and variable attributes of the NetCDF file we want to write. The format of the structured array is that returned by Matlab's ncinfo routine. Let's take a closer look at the how's and why's of it's use.

One of the strengths of the NetCDF file format is it's ability to be a fully self-contained and self-describing file. This is accomplished through the use of global and variable attributes.

writeIoosGliderFlatNc.m uses a NetCDF template file to construct an empty file and then writes the sensor data to the file. The downside of this is that the global and variable attributes from the template file are written to the file which may result in incorrect descriptions of the file's metadata and variables. The ncschema option to writeIoosGliderFlatNc.m allows us to specify both the global and variable attributes for the current data set by overwriting the default values.

The most painless, accurate and configurable way to do this is as follows:

  1. Copy the CDL representation of a template file to a directory.

  2. Open the CDL template in a text editor and modify the global and variable attributes to reflect the proper metadata and attribution.

  3. Use ncgen to create an empty NetCDF4 file containing the description provided by the CDL file. This is done from the command line using:

     > ncgen -k 4 -o MY_GLIDER_TEMPLATE.nc MY_GLIDER_TEMPLATE.cdl
    

This will create a file, MY_GLIDER_TEMPLATE.nc whos structure can be loaded into a Matlab structured array:

>> ncSchema = ncinfo('/tmp/MY_GLIDER_TEMPLATE.nc')

ncSchema = 

      Filename: '/tmp/MY_GLIDER_TEMPLATE.nc'
          Name: '/'
    Dimensions: [1x2 struct]
     Variables: [1x38 struct]
    Attributes: [1x34 struct]
        Groups: []
        Format: 'netcdf4_classic'

This object can then be passed to writeIoosGliderFlatNc via the ncschema option. Remember that writeIoosGliderFlatNc.m writes a single NetCDF file representing a single profile, so we must specify the profile element we'd like to operate on:

>> outFile = writeIoosGliderFlatNc(ncStruct(1), 'clobber', true, 'ncschema', ncSchema, 'outdirectory', '/tmp')
Clobbering existing file: /tmp/ru29-20131110T1417.nc

outFile =

/tmp/ru29-20131110T1417.nc

This file conforms to the file format specification described here an contains the global and variable attributes described in the ncSchema object passed via the ncschema option.

Once the files have been written, they may then be submitted to the IOOS NGDAC after registered as data provider. Details on the file submission process can be found here.

##DbdGroup2IoosNc##

Clone this wiki locally