Skip to content

Quick Start Guide

John Kerfoot edited this page Jul 15, 2020 · 8 revisions

WikiCookbookQuick Start

Quick start guide for getting up and running...

Contents

Installation

Conda:

> git clone https://github.com/kerfoot/gdutils.git
> cd ./gdutils
> conda env create -f environment.yml

You'll need to add this path to your PYTHONPATH in order import the package from within python. Conda provides instructions on how to do this.

Querying Glider DAC Datasets

Next, we'll want to connect to the Glider DAC ERDDAP server using the GdacClient class:

> from gdutils import GdacClient
>
> client = GdacClient()

You are now connected to the server but you need to perform a dataset search in order to work with one or more datasets. This is accomplished via the search_datasets() method. This method performs an ERDDAP advanced search to find either all of the datasets on the server or a subset meeting a particular criteria (time bounds, text search, bounding box, etc.).

Here's how we fetch all of the dataset records:

> client.search_datasets()

Be patient, the search may take a few minutes if you're grabbing all of the datasets:

> client
<GdacClient(server='https://gliders.ioos.us/erddap', response='csv', num_datasets=557)>

For those familiar with ERDDAP's advanced search functionality, this search fetches all available real-time datasets using the following query:

https://gliders.ioos.us/erddap/search/advanced.html?page=1&itemsPerPage=1000&searchFor=all&protocol=%28ANY%29&cdm_data_type=%28ANY%29&institution=%28ANY%29&ioos_category=%28ANY%29&keywords=%28ANY%29&long_name=%28ANY%29&standard_name=%28ANY%29&variableName=%28ANY%29&maxLat=&minLon=&maxLon=&minLat=&minTime=&maxTime=

An additional query is performed for each Dataset ID returned from the above query to fetch the time, latitude, longitude and WMO ID for each profile and the results are merged into a pandas DataFrame available in the client.datasets property:

> client.datasets.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 557 entries, 0 to 556
Data columns (total 26 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   dataset_id       557 non-null    object
 1   glider           557 non-null    object
 2   wmo_id           557 non-null    object
 3   start_date       557 non-null    datetime64[ns, UTC]
 4   end_date         557 non-null    datetime64[ns, UTC]
 5   deployment_lat   557 non-null    float64
 6   deployment_lon   557 non-null    float64
 7   lat_min          557 non-null    float64
 8   lat_max          557 non-null    float64
 9   lon_min          557 non-null    float64
 10  lon_max          557 non-null    float64
 11  num_profiles     557 non-null    int64
 12  days             557 non-null    int64
 13  subset           557 non-null    object
 14  tabledap         557 non-null    object
 15  make_a_graph     557 non-null    object
 16  files            0 non-null      float64
 17  title            557 non-null    object
 18  summary          557 non-null    object
 19  fgdc             557 non-null    object
 20  iso_19115        557 non-null    object
 21  info             557 non-null    object
 22  background_info  557 non-null    object
 23  rss              557 non-null    object
 24  email            557 non-null    object
 25  institution      557 non-null    object
dtypes: datetime64[ns, UTC](2), float64(7), int64(2), object(15)
memory usage: 113.3+ KB

The majority of the properties and non-plotting methods return either a Pandas DataFrame or Series, which allows for quick slicing, dicing and indexing using normal Pandas kung fu.

Visualizing Search Results

Once the search is completed, we can access the following properties to retrieve a calendar for plotting:

These properties store a calendar DataFrame that is the first argument to the plot_calendar function. This function creates a heatmap displaying the calendar. For example, let's look at the distribution of deployments grouped by year and month:

> from gdutils.plot import plot_calendar
> import matplotlib.pyplot as plt
> calendar = client.ym_deployments_calendar
> fig, ax = plt.subplots(figsize=(8., 8.))
> plot_calendar(calendar, ax=ax)
> ax.invert_yaxis()
> plot.show()

ym_datasets

or the number of glider days in each month:

> calendar = client.ym_glider_days_calendar
> fig, ax = plt.subplots(figsize=(8., 8.))
> plot_calendar(calendar, ax=ax)
> ax.invert_yaxis()
> plot.show()

ym_glider_days

or the number of profiles in each month:

> calendar = client.ym_glider_days_calendar
> fig, ax = plt.subplots(figsize=(8., 8.))
> plot_calendar(calendar, ax=ax)
> ax.invert_yaxis()
> plot.show()

ym_profiles

Finally, we can summarize the number of deployments, glider days and profiles by year:

> client.plot_yearly_totals()
> plt.show()

datasets calendar

This quick introduction provides a look at some of the higher-level functionality provided by gdutils, pandas, erddapy and seaborn.