ENH/SEAEXP: allow dropna for very large raw payload files #117

jklymak · 2022-09-27T21:15:06Z

We had a bunch of sea explorer files like:

PLD_REALTIMECLOCK;NAV_RESOURCE;NAV_LONGITUDE;NAV_LATITUDE;NAV_DEPTH;AROD_FT_TEMP;AROD_FT_DO;FLBBCD_CHL_COUNT;FLBBCD_CHL_SCALED;FLBBCD_BB_700_COUNT;FLBBCD_BB_700_SCALED;FLBBCD_CDOM_COUNT;FLBBCD_CDOM_SCALED;GPCTD_CONDUCTIVITY;GPCTD_TEMPERATURE;GPCTD_PRESSURE;GPCTD_DOF;
05/08/2020 23:42:41.851;117;-12837.081;5125.986;8.476;;;202;1.0792;283;0.000417303;59;0.7245;3.95489;15.4739;3.01;;
05/08/2020 23:42:41.873;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.881;117;-12837.081;5125.986;8.476;15.457;319.59;;;;;;;;;;;
05/08/2020 23:42:41.891;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.901;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.911;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:41.920;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.929;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:41.939;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.947;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.957;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:41.967;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.976;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:41.986;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:41.996;117;-12837.081;5125.986;8.476;;;;;;;;;3.95534;15.4766;2.90;;
05/08/2020 23:42:42.012;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.021;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.031;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.039;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.049;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.059;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.068;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.078;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.087;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.095;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.105;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.115;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.125;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.134;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.144;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.154;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.164;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.173;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.183;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.192;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.202;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.210;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.220;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.230;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.240;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.249;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.265;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.275;117;-12837.081;5125.986;8.476;15.458;319.61;;;;;;;;;;;
05/08/2020 23:42:42.284;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;
05/08/2020 23:42:42.294;117;-12837.081;5125.986;8.476;;;;;;;;;;;;;

where a) many of the files had no data, and b) the AROD data was sampled way more times than it actually was outputting data.

The change here allows us to use pandas dropna (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html) to clean up the data set before moving to xarray, in a relatively flexible way:

sea.raw_to_rawnc('test/', 'testout/', 'boo',
                 dropna_subset=['GPCTD_TEMPERATURE', 'FLBBCD_CHL_COUNT'])

So if both GPCTD and FLBBCD are NaN, the line will be ignored. This makes the data sets massively smaller, with no loss of actual collected data (I hope ;-). There is a slight loss of data synchrony, but that should be on the order of 8Hz, so should not affect data processing.

ENH/SEAEXP: allow dropna for very large raw payload files

95ec8ca

hvdosser merged commit 524d6e5 into c-proof:main Sep 27, 2022

jklymak deleted the enh-allow-dropna-seaexplorer branch September 27, 2022 21:35

jklymak added the glider: seaexplorer label Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/SEAEXP: allow dropna for very large raw payload files #117

ENH/SEAEXP: allow dropna for very large raw payload files #117

jklymak commented Sep 27, 2022

ENH/SEAEXP: allow dropna for very large raw payload files #117

ENH/SEAEXP: allow dropna for very large raw payload files #117

Conversation

jklymak commented Sep 27, 2022