-
Notifications
You must be signed in to change notification settings - Fork 10
Writing Single Profile NetCDF Files
In this post, we'll walk through the process of writing single profile NetCDF files suitable for submission to the IOOS National Glider Data Assembly Center. Detailed information on the IOOS NGDAC and the NetCDF file format can be found here.
The following is a detailed discussion of the entire process. This process has been wrapped up into a single function:
which provides a convenient and relatively painless way to write NetCDF files for profiles contained in a DbdGroup instance. However, I recommend a careful reading of the following for understanding of what's going on under the hood. With that said, documentation on the scripts usage can be found here.
#Contents#
- Loading Glider Data Sets
- Mapping Glider Sensors to NetCDF Variables
- Writing Profile NetCDF Files
- DbdGroup2IoosNc: Automating the Process
##Loading Glider Data Sets##
We'll be using an instance of the DbdGroup class and the instance's toProfiles() method to create a structured array in which each element of the array represents an individual profile from the data set.
Let's load the DbdGroup and get started:
>> load ./deployments-test/2013/ru29-401/ru29-401_DbdGroup_sci-qc0.mat
>> dgroup
dgroup =
DbdGroup with properties:
segments: {695x1 cell}
sourceFiles: {695x1 cell}
bytes: [695x1 double]
rows: [695x1 double]
timestampSensors: {695x1 cell}
depthSensors: {695x1 cell}
startTimes: {695x1 cell}
endTimes: {695x1 cell}
startDatenums: [695x1 double]
endDatenums: [695x1 double]
sensors: {45x1 cell}
sensorUnits: [1x1 struct]
numProfiles: [695x1 double]
dbds: [1x695 Dbd]
hasBoundingGps: [695x1 logical]
newSegments: {}
scratch: [1x1 struct]
Once we've got the data set loaded, we need to export it to a structured array using the DbdGroup.toProfiles method:
>> pStruct = dgroup.toProfiles
pStruct =
1x474 struct array with fields:
meta
timestamp
depth
drv_distance_along_track
drv_latitude
drv_longitude
drv_m_gps_lat
drv_m_gps_lon
drv_m_present_time_datenum
drv_m_pressure
drv_proDir
drv_proInds
drv_sci_m_present_time_datenum
drv_sci_water_pressure
drv_sea_water_density
drv_sea_water_electrical_conductivity
drv_sea_water_potential_temperature
drv_sea_water_salinity
drv_sea_water_temperature
drv_speed_of_sound_in_seawater
m_altitude
m_avg_speed
m_battery_inst
m_coulomb_amphr
m_de_oil_vol
m_final_water_vx
m_final_water_vy
m_gps_lat
m_gps_lon
m_hdg_derror
m_hdg_ierror
m_lat
m_lon
m_pitch
m_present_time
m_pressure
m_roll
m_science_on
m_tot_num_inflections
m_vacuum
m_water_depth
m_water_vx
m_water_vy
sci_m_disk_free
sci_m_present_time
sci_water_cond
sci_water_pressure
sci_water_temp
The return value is a structured array in which each element of the array is a single profile from the deployment. Each profile contains a metadata field:
>> pStruct(1).meta
ans =
glider: 'ru29'
segment: 'ru29_2013_313_2_0'
sourceFile: '/home/kerfoot/sandbox/glider/deployments-test/2013/ru29-401/ascii/queue/ru29_2013_313_2_0_sbd.dat'
the8x3filename: '01260000'
timestampSensor: 'drv_sci_m_present_time_datenum'
depthSensor: 'drv_sci_water_pressure'
startDatenum: 735548.595819052
endDatenum: 735548.59918617
startTime: '2013-11-10 14:17:58'
endTime: '2013-11-10 14:22:49'
minDepth: 9
maxDepth: 69.75
lonLat: [-14.4442897142857 -7.86986685714286]
direction: 'd'
The remaining fields are 1-D arrays containing all of the available sensor data. These fields will vary depending on the glider's science payload configuration. Two fields (timestamp and depth) are always present and are aliases for the sensors specified in each Dbd instance of the DbdGroup.dbds array via the Dbd.timestampSensor and Dbd.depthSensor, respectively.
This data structure will be used to map native glider or DbdGroup sensor names to their NetCDF file equivalents.
##Mapping Glider Sensors to NetCDF Variables##
We're going to create a very simple NetCDF File to represent a single profile from the glider dataset. The CDL description of this format is here.
We can use mapIoosGliderFlatNcSensors.m to create a data structure in which each elements represents a profile and contains native glider sensors we want to include mapped to the appropriate NetCDF file variables. Internally, this routine calls getIoosGliderFlatNcSensorMappings, which returns a structured array mapping the NetCDF variables in our schema to the sensors available in the pStruct structured array we created in the previous step. This routine contains the default list of sensors that may be selected for mapping to each NetCDF variable and may be modified in the event that new sensors resulting from some level of QA/QC, for example, have been added to the DbdGroup. Here's what the return value looks like:
>> sensorMap = getIoosGliderFlatNcSensorMappings
sensorMap =
time: {3x1 cell}
lat: {2x1 cell}
lon: {2x1 cell}
pressure: {3x1 cell}
depth: {2x1 cell}
temperature: {3x1 cell}
conductivity: {3x1 cell}
salinity: {'drv_sea_water_salinity'}
density: {'drv_sea_water_density'}
u: {3x1 cell}
v: {3x1 cell}
time_uv: {}
lat_uv: {}
lon_uv: {}
profile_id: {}
profile_time: {}
profile_lat: {}
profile_lon: {}
trajectory: {}
Let's take a closer look at the time field and how mapIoosGliderFlatNcSensors.m uses the list of available sensors to select the appropriate data array. getIoosGliderFlatNcSensorMappings is called internally by mapIoosGliderFlatNcSensors.m, but it's worth looking at how the sensor mapping is done:
>> sensorMap.time
ans =
'timestamp'
'drv_sci_m_present_time_datenum'
'drv_m_present_time_datenum'
The order of the sensors contained in the cell array is used to select the first valid time sensor from pStruct. In this case timestamp is contained in the first profile in pStruct (pStruct(1)):
>> pStruct(1)
ans =
meta: [1x1 struct]
timestamp: [35x1 double] << SENSOR FOUND
depth: [35x1 double]
drv_distance_along_track: [35x1 double]
drv_latitude: [35x1 double]
drv_longitude: [35x1 double]
drv_m_gps_lat: [35x1 double]
drv_m_gps_lon: [35x1 double]
drv_m_present_time_datenum: [35x1 double]
drv_m_pressure: [35x1 double]
drv_proDir: [35x1 double]
drv_proInds: [35x1 double]
drv_sci_m_present_time_datenum: [35x1 double]
drv_sci_water_pressure: [35x1 double]
drv_sea_water_density: [35x1 double]
drv_sea_water_electrical_conductivity: [35x1 double]
drv_sea_water_potential_temperature: [35x1 double]
drv_sea_water_salinity: [35x1 double]
drv_sea_water_temperature: [35x1 double]
drv_speed_of_sound_in_seawater: [35x1 double]
m_altitude: [35x1 double]
m_avg_speed: [35x1 double]
m_battery_inst: [35x1 double]
m_coulomb_amphr: [35x1 double]
m_de_oil_vol: [35x1 double]
m_final_water_vx: [35x1 double]
m_final_water_vy: [35x1 double]
m_gps_lat: [35x1 double]
m_gps_lon: [35x1 double]
m_hdg_derror: [35x1 double]
m_hdg_ierror: [35x1 double]
m_lat: [35x1 double]
m_lon: [35x1 double]
m_pitch: [35x1 double]
m_present_time: [35x1 double]
m_pressure: [35x1 double]
m_roll: [35x1 double]
m_science_on: [35x1 double]
m_tot_num_inflections: [35x1 double]
m_vacuum: [35x1 double]
m_water_depth: [35x1 double]
m_water_vx: [35x1 double]
m_water_vy: [35x1 double]
sci_m_disk_free: [35x1 double]
sci_m_present_time: [35x1 double]
sci_water_cond: [35x1 double]
sci_water_pressure: [35x1 double]
sci_water_temp: [35x1 double]
So the corresponding data array is used for the NetCDF time variable. If timestamp was not found in the profile element, the function next looks for drv_sci_m_present_time_datenum and uses that data array, if found. If not, the function finally looks for drv_m_present_time_datenum.
Here's an example of the use of mapIoosGliderFlatNcSensors.m:
>> trajectoryTs = datenum(2013,11,10,14,0,0)
trajectoryTs =
735548.583333333
>> ncStruct = mapIoosGliderFlatNcSensors(pStruct, trajectoryTs)
ncStruct =
1x527 struct array with fields:
profile_id
meta
vars
There are 2 input arguments to mapIoosGliderFlatNcSensors: (1) the profiles structure array we created using the DbdGroup.toProfiles method and (2) a Matlab datenum number specifying the deployment start date. The return value, ncStruct, is a structured array containing 3 fields:
- profile_id: Sequential profile number corresponding to the element index. This field must contain a numeric scalar in order for a NetCDF file to be written. It's also a variable in the NetCDF file (ie: contained in the ncStruct.vars structured array, and the value stored in ncStruct.vars is written to the NetCDF file.
- meta: a structured array containing metadata about the profile
- vars: a structured array in which each element describe a NetCDF variable, it's corresponding native glider sensor and the 1-D data array
Here's what the structure of the meta and vars fields look like, respectively:
>> ncStruct(1).meta
ans =
glider: 'ru29'
segment: 'ru29_2013_313_2_0'
sourceFile: '/home/kerfoot/sandbox/glider/deployments-test/2013/ru29-401/ascii/queue/ru29_2013_313_2_0_sbd.dat'
the8x3filename: '01260000'
timestampSensor: 'drv_sci_m_present_time_datenum'
depthSensor: 'drv_sci_water_pressure'
startDatenum: 735548.595819052
endDatenum: 735548.59918617
startTime: '2013-11-10 14:17:58'
endTime: '2013-11-10 14:22:49'
minDepth: 9
maxDepth: 69.75
lonLat: [-14.4442897142857 -7.86986685714286]
direction: 'd'
>> ncStruct(1).vars
ans =
1x19 struct array with fields:
ncVarName
sensor
data
Now that we've got the glider sensor data mapped to the NetCDF file variable schema, it's time to write some NetCDF files.
##Writing Profile NetCDF Files##
The NetCDF file format we're using stores all sensor data records from a single profile into one NetCDF file. This file specification is used by the IOOS National Glider Data Assembly Center to collect and aggregate individual profiles from a glider deployment into a single data set containing all data from the specified deployment. Once aggregated, access to the deployment data set is relatively simple and can be searched a retrieved by time, geographic location, depth and sensor name. The aggregation is done using NOAA's Environmental Research Division's Data Access Program (ERDDAP).
Before we get to NetCDF file creation, I want to mention an important point regarding this file specification. As mentioned before, the file format specification represents a single profile contained in the deployment dataset. In fact, the specification contains a profile_id variable that is a numeric scalar representing the profile number. As such, this number must be unique to the data set. If it's not unique, the ERDDAP aggregation will fail. For example, if you have 10 profiles that you want to write NetCDF files for, the ncStruct.vars element representing the profile variable must contain a value that is unique, ie: 1,2,3,4,5,6,7,8,9 or 10.
This toolbox provides a function, writeIoosGliderFlatNc.m, that takes a single element from the data structure created using mapIoosGliderFlatNcSensors.m and writes the corresponding NetCDF file. Let's first take a look at the doco for this routine:
>> help writeIoosGliderFlatNc
outFile = writeIoosGliderFlatNc(pStruct[,varargin])
Accepts a single profile contained in pStruct, returned from
mapIoosGliderFlatNcSensors.m, and writes a NetCDF file conforming to
the IOOS National Glider Data Assembly Standard Specification, version 2.
Options:
'clobber', [true or false]: by default, existing NetCDF files are not
overwritten. Set to true to overwrite existing files.
'ncschema', STRUCT: structured array mapping global NetCDF file
attributes to values. If not specified, default values are taken from
the NetCDF template file.
'outfile', STRING: the NetCDF filename is constructed from the .meta
field. Use this option to specify a custom filename.
'outdirectory', STRING: NetCDF files are written to the current working
directory. Use this option to specify an alternate path.
'mode', STRING: the state of the dataset, which can be either 'rt' for
real-time (i.e.: sbd/tbd files) or 'delayed' for a recovered dataset (i.e.:
dbd/ebd files). Default is rt.
See also mapIoosGliderFlatNcSensors getIoosGliderFlatNcSensorMappings loadNcJsonSchema
By default, the routine creates a new NetCDF file in the current working directory assuming it doesn't already exist. There are a few options that can be used to modify this behavior and the resulting NetCDF file. Three of them, clobber, outfile and outdirectory are self-explanatory. The second option, ncschema allows us to specify a structured array defining the global and variable attributes of the NetCDF file we want to write. The format of the structured array is that returned by Matlab's ncinfo routine. Let's take a closer look at the how's and why's of it's use.
One of the strengths of the NetCDF file format is it's ability to be a fully self-contained and self-describing file. This is accomplished through the use of global and variable attributes.
writeIoosGliderFlatNc.m uses a NetCDF template file to construct an empty file and then writes the sensor data to the file. The downside of this is that the global and variable attributes from the template file are written to the file which may result in incorrect descriptions of the file's metadata and variables. The ncschema option to writeIoosGliderFlatNc.m allows us to specify both the global and variable attributes for the current data set by overwriting the default values.
The most painless, accurate and configurable way to do this is as follows:
-
Copy the CDL representation of a template file to a directory.
-
Open the CDL template in a text editor and modify the global and variable attributes to reflect the proper metadata and attribution.
-
Use ncgen to create an empty NetCDF4 file containing the description provided by the CDL file. This is done from the command line using:
> ncgen -k 4 -o MY_GLIDER_TEMPLATE.nc MY_GLIDER_TEMPLATE.cdl
This will create a file, MY_GLIDER_TEMPLATE.nc whos structure can be loaded into a Matlab structured array:
>> ncSchema = ncinfo('/tmp/MY_GLIDER_TEMPLATE.nc')
ncSchema =
Filename: '/tmp/MY_GLIDER_TEMPLATE.nc'
Name: '/'
Dimensions: [1x2 struct]
Variables: [1x38 struct]
Attributes: [1x34 struct]
Groups: []
Format: 'netcdf4_classic'
This object can then be passed to writeIoosGliderFlatNc via the ncschema option. Remember that writeIoosGliderFlatNc.m writes a single NetCDF file representing a single profile, so we must specify the profile element we'd like to operate on:
>> outFile = writeIoosGliderFlatNc(ncStruct(1), 'clobber', true, 'ncschema', ncSchema, 'outdirectory', '/tmp')
Clobbering existing file: /tmp/ru29-20131110T1417.nc
outFile =
/tmp/ru29-20131110T1417.nc
This file conforms to the file format specification described here an contains the global and variable attributes described in the ncSchema object passed via the ncschema option.
Once the files have been written, they may then be submitted to the IOOS NGDAC after registered as data provider. Details on the file submission process can be found here.
##DbdGroup2IoosNc##