-
Notifications
You must be signed in to change notification settings - Fork 4
Quick Start Guide
Quick start guide for getting up and running...
Conda:
> git clone https://github.com/kerfoot/gdutils.git
> cd ./gdutils
> conda env create -f environment.yml
You'll need to add this path to your PYTHONPATH in order import the package from within python. Conda provides instructions on how to do this.
Next, we'll want to connect to the Glider DAC ERDDAP server using the GdacClient class:
> from gdutils import GdacClient
>
> client = GdacClient()
You are now connected to the server but you need to perform a dataset search in order to work with one or more datasets. This is accomplished via the search_datasets() method. This method performs an ERDDAP advanced search to find either all of the datasets on the server or a subset meeting a particular criteria (time bounds, text search, bounding box, etc.).
Here's how we fetch all of the dataset records:
> client.search_datasets()
Be patient, the search may take a few minutes if you're grabbing all of the datasets:
> client
<GdacClient(server='https://gliders.ioos.us/erddap', response='csv', num_datasets=557)>
For those familiar with ERDDAP's advanced search functionality, this search fetches all available real-time datasets using the following query:
https://gliders.ioos.us/erddap/search/advanced.html?page=1&itemsPerPage=1000&searchFor=all&protocol=%28ANY%29&cdm_data_type=%28ANY%29&institution=%28ANY%29&ioos_category=%28ANY%29&keywords=%28ANY%29&long_name=%28ANY%29&standard_name=%28ANY%29&variableName=%28ANY%29&maxLat=&minLon=&maxLon=&minLat=&minTime=&maxTime=
An additional query is performed for each Dataset ID returned from the above query to fetch the time, latitude, longitude and WMO ID for each profile and the results are merged into a pandas DataFrame available in the client.datasets property:
> client.datasets.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 557 entries, 0 to 556
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 dataset_id 557 non-null object
1 glider 557 non-null object
2 wmo_id 557 non-null object
3 start_date 557 non-null datetime64[ns, UTC]
4 end_date 557 non-null datetime64[ns, UTC]
5 deployment_lat 557 non-null float64
6 deployment_lon 557 non-null float64
7 lat_min 557 non-null float64
8 lat_max 557 non-null float64
9 lon_min 557 non-null float64
10 lon_max 557 non-null float64
11 num_profiles 557 non-null int64
12 days 557 non-null int64
13 subset 557 non-null object
14 tabledap 557 non-null object
15 make_a_graph 557 non-null object
16 files 0 non-null float64
17 title 557 non-null object
18 summary 557 non-null object
19 fgdc 557 non-null object
20 iso_19115 557 non-null object
21 info 557 non-null object
22 background_info 557 non-null object
23 rss 557 non-null object
24 email 557 non-null object
25 institution 557 non-null object
dtypes: datetime64[ns, UTC](2), float64(7), int64(2), object(15)
memory usage: 113.3+ KB
The majority of the properties and non-plotting methods return either a Pandas DataFrame or Series, which allows for quick slicing, dicing and indexing using normal Pandas kung fu.
Once the search is completed, we can access the following properties to retrieve a calendar for plotting:
- ymd_profiles_calendar: Profile counts organized by year, month and day.
- ym_profiles_calendar: Profile counts organized by year and month.
- md_profiles_calendar: Profile counts organized by month and day of the month
- ymd_glider_days_calendar: Glider day counts organized by year, month and day.
- ym_glider_days_calendar: Glider day counts organized by year and month.
- md_glider_days_calendar: Glider day counts organized by month and day of the month
- ymd_deployments_calendar: Deployment counts organized by year, month and day.
- ym_glider_days_calendar: Deployment counts organized by year and month.
- md_glider_days_calendar: Deployment counts organized by month and day of the month.
These properties store a calendar DataFrame that is the first argument to the plot_calendar function. This function creates a heatmap displaying the calendar. For example, let's look at the distribution of deployments grouped by year and month:
> from gdutils.plot import plot_calendar
> import matplotlib.pyplot as plt
> calendar = client.ym_deployments_calendar
> fig, ax = plt.subplots(figsize=(8., 8.))
> plot_calendar(calendar, ax=ax)
> ax.invert_yaxis()
> plot.show()
or the number of glider days in each month:
> calendar = client.ym_glider_days_calendar
> fig, ax = plt.subplots(figsize=(8., 8.))
> plot_calendar(calendar, ax=ax)
> ax.invert_yaxis()
> plot.show()
or the number of profiles in each month:
> calendar = client.ym_glider_days_calendar
> fig, ax = plt.subplots(figsize=(8., 8.))
> plot_calendar(calendar, ax=ax)
> ax.invert_yaxis()
> plot.show()
Finally, we can summarize the number of deployments, glider days and profiles by year:
> client.plot_yearly_totals()
> plt.show()
This quick introduction provides a look at some of the higher-level functionality provided by gdutils, pandas, erddapy and seaborn.