Skip to content

Commit

Permalink
v2.00
Browse files Browse the repository at this point in the history
  • Loading branch information
Bob Simons committed Jun 24, 2019
1 parent efe4bd7 commit 5bcc7d1
Show file tree
Hide file tree
Showing 9 changed files with 508 additions and 414 deletions.
2 changes: 1 addition & 1 deletion WEB-INF/classes/com/cohort/util/File2.java
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ public class File2 {
//compressed and image ext from wikipedia
//many ext from http://www.fileinfo.com/filetypes/common
public static final String BINARY_EXT[] = {
".accdb", ".bin", ".bufr", ".cab", ".cer", ".class", ".cpi", ".csr",
".accdb", ".bin", ".bufr", ".cab", ".cdf", ".cer", ".class", ".cpi", ".csr",
".db", ".dbf", ".dll", ".dmp", ".drv", ".dwg", ".dxf", ".fnt", ".fon",
".grb", ".grib", ".grib2", ".ini", ".keychain",
".lnk", ".mat", ".mdb", ".mim", ".nc",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26175,6 +26175,8 @@ public void orderByMax(int keyColumns[]) throws Exception {
new StringArray(new String[]{getColumnName(lastKeyColumn)}),
new StringArray(new String[]{"!="}),
new StringArray(new String[]{lastKCMV}));
if (nRows == 0)
return;

//sort based on keys
ascendingSort(keyColumns);
Expand Down Expand Up @@ -26246,7 +26248,7 @@ public void orderByMin(int keyColumns[]) throws Exception {
new StringArray(new String[]{getColumnName(lastKeyColumn)}),
new StringArray(new String[]{"!="}),
new StringArray(new String[]{lastKCMV}));
if (nRows <= 1)
if (nRows == 0)
return;

//sort based on keys
Expand Down Expand Up @@ -26323,6 +26325,8 @@ public void orderByMinMax(int keyColumns[]) throws Exception {
new StringArray(new String[]{getColumnName(lastKeyColumn)}),
new StringArray(new String[]{"!="}),
new StringArray(new String[]{lastKCMV}));
if (nRows == 0)
return;

//sort based on keys
ascendingSort(keyColumns);
Expand Down
4 changes: 2 additions & 2 deletions WEB-INF/classes/gov/noaa/pfel/erddap/Erddap.java
Original file line number Diff line number Diff line change
Expand Up @@ -17678,9 +17678,9 @@ public static void testHammerGetDatasets() throws Throwable {
*
*/
public static void test() throws Throwable {
/* for releases, this line should have open/close comment
/* for releases, this line should have open/close comment */
testBasic();
*/ testJsonld();
testJsonld();
}

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public class EDDTableFromFilesCallable implements Callable {
* not by changing the code here)
* if you want every possible diagnostic message sent to String2.log.
*/
public static boolean debugMode = true;
public static boolean debugMode = false;

String identifier;
int task;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,8 @@ else if (fileType.equals(".timeGaps"))
} else if (extension.equals(".mpp")) {
response.setContentType("application/vnd.ms-project");

} else if (extension.equals(".nc")) {
} else if (extension.equals(".nc") ||
extension.equals(".cdf")) {
response.setContentType("application/x-netcdf");

} else if (extension.equals(".odb")) {
Expand Down Expand Up @@ -735,9 +736,9 @@ else if (fileType.equals(".timeGaps"))
if (characterEncoding != null && characterEncoding.length() > 0)
response.setCharacterEncoding(characterEncoding);

//specify the file's name (this may force show File Save As dialog box in user's browser)

//specify the file's name (this may force show File Save As dialog box in user's browser)
if (genericCompressed || //include all genericCompressed types
extension.equals(".cdf") ||
extension.equals(".csv") ||
extension.equals(".itx") ||
extension.equals(".js") ||
Expand Down
2 changes: 1 addition & 1 deletion WEB-INF/classes/gov/noaa/pfel/erddap/util/EDStatic.java
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ public class EDStatic {
* <br>1.78 released on 2017-05-27
* <br>1.80 released on 2017-08-04
* <br>1.82 released on 2018-01-26
* <br>2.00 released on 2019-06-18
* <br>2.00 released on 2019-06-24
*
* For master branch releases, this will be a floating point
* number with 2 decimal digits, with no additional text.
Expand Down
84 changes: 62 additions & 22 deletions download/changes.html
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,9 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
<br>It is an OAuth 2.0 authentication system, much like Google authentication.
ORCID is widely used by researchers to uniquely identify themselves.
ORCID accounts are free and don't have the privacy issues that Google accounts have.
See ERDDAP's <a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#authenticationOrcid"
>Orcid authentication instructions</a>.
Thanks to BCO-DMO (Adam Shepard, Danie Kinkade, etc.).
<br>&nbsp;
<li>NEW: A new URL converter converts out-of-date URLs into up-to-date URLs.
Expand Down Expand Up @@ -192,7 +195,7 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
This feature is also used by the standardizeWhat feature of EDDTableFromFiles.
Thanks to Bob Simons.
<br>&nbsp;
<li>NEW: For non-lat/lon maps and non-surface graphs on griddap's and tabledap's
<li>NEW: For graphs (other than surface graphs) on griddap's and tabledap's
Make A Graph web pages, when the x axis isn't a time axis,
if only a subset of the x axis variable's range is visible,
there are now buttons above the graph to shift the X Axis leftwards or rightwards.
Expand Down Expand Up @@ -283,7 +286,7 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
the -XX:MaxPermSize setting, which (Adopt)OpenJDK no longer supports.
<br>&nbsp;
<li>TO DO: The new default and recommend &lt;fontFamily&gt; setting in setup.xml is
<kbd>DejaVu Sans</kbd> which are built into AdoptOpenJDK's Java. See the
<br><kbd>DejaVu Sans</kbd> which are built into AdoptOpenJDK's Java. See the
<br><a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#fonts"
>revised font installation instructions</a>.
Expand Down Expand Up @@ -337,7 +340,7 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
(other than for &lt;startBodyHtml5&gt; and &lt;theShortDescriptionHtml&gt;),
please consider switching to the new default.
After you copy each value, delete the tag and its description
from datasets.xml
from setup.xml
They (and other similar tags) are now better described in
<a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#basicStructure"
Expand All @@ -364,30 +367,38 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
Usually, users will just want the latest version of the dataset, with all
changes applied. But there is the option for users to request data from the
dataset as it was at any point in time. This facilitates reproducible science.
Thus, unlike most other near-real-time datasets, these datasets are eligible for
<a rel="bookmark" href="https://en.wikipedia.org/wiki/Digital_object_identifier"
>DOIs<img src="../images/external.png" alt=" (external link)"
title="This link to an external website does not constitute an endorsement."></a>.
because they meet the DOI requirement
that the dataset by unchanging, except by aggregation.
See
<a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDTableFromHttpGet"
>EDDTableFromHttpGet</a>.
Thanks to OOI (from long ago and now) for talking about the need for this
and Eugene Burger for the reminder about working on what is important.
<br>&nbsp;
<li>BIG NEW FEATURE: ERDDAP can now serve data directly from compressed data files,
<br>including .tgz, .tar.gz, .tar.gzip, .gz, .gzip, .zip, .bz2, or .Z.
Datasets may include a mix of compressed files (perhaps the older data files?)
and non-compressed files, and you can compress/decompress a file at any time.

<li>BIG NEW FEATURE: ERDDAP can now serve data directly from externally-compressed data files,
including .tgz, .tar.gz, .tar.gzip, .gz, .gzip, .zip, .bz2, or .Z.
Datasets may include a mix of externally-compressed files (perhaps the older data files?)
and non-externally-compressed files, and you can compress/decompress a file at any time.
<p>This works great!
<br>In most cases, the slowdown related to decompressing the
files is minor. We strongly encourage you to try this.
files is minor. We strongly encourage you to try this, notably for datasets
and/or data files that are infrequently used.
<p>This may save you $30,000 or more!
<br>This is one of the few ERDDAP features that can save you lots of money --
if you compress a lot of data files, you will need far fewer RAIDs/hard drives
to store the data, or conversely, you can serve far more data (10x ?) with the
to store the data, or conversely, you can serve far more data (up to 10x) with the
RAIDs you already have. If this feature saves you from buying another RAID,
then it has saved you about $30,000.

<p>See the <a rel="help"
href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#CompressedFiles"
>Compressed Files documentation</a>.
href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#ExternallyCompressedFiles"
>Externally Compressed Files documentation</a>.
Thanks to Benoit Perrimond and Paloma Delavallee.

<li>BIG NEW FEATURE: All EDDGridFromFiles and all EDDTableFromFiles datasets support a
Expand All @@ -398,7 +409,9 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
this will download files from the remote dataset, as needed,
into a local cache with a limited size,
which is useful when working with cloud-based (e.g., S3) data files.
See the <a href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#cacheFromUrl" rel="bookmark">cacheFromUrl documentation</a> for details.
See the
<a href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#cacheFromUrl" rel="bookmark"
>cacheFromUrl documentation</a> for details.
Thanks to Bob Simons and Roy Mendelssohn (who for years have been writing scripts
to handle making local copies of remote dataset files), Eugene Burger,
Conor Delaney (when he was at Amazon Web Services), and the Google Cloud Platform.
Expand All @@ -417,18 +430,36 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan

<li>NEW: All EDDGrid and all EDDTableFromFiles datasets support an <kbd>&lt;nThreads&gt;</kbd> setting,
which tells ERDDAP how many threads to use when responding to a request.
<p>tl;dr Rule of Thumb:
<p>Rule of Thumb:
For most datasets on most systems, use nThreads=1, the default.
If you have a powerful computer (lots of CPU cores, lots of memory),
then set nThreads to 2, 3, or 4 for datasets that will benefit the most
(e.g., lots of small files stored on a high-bandwidth parallel file system),
but keep an eye on ERDDAP's memory use, thread use, and overall responsiveness
then consider setting nThreads to 2, 3, or 4 for datasets that will benefit the most,
notably datasets where something causes a lag before a chunk of data can actually
be processed. For example:
<ul>
<li>Datasets with
<a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#ExternallyCompressedFiles"
>externally-compressed (e.g., .gz)</a>
binary (e.g., .nc) files, because ERDDAP has to decompress the whole file
before it can start to read the file.
<li>Datasets that use
<a href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#cacheFromUrl" rel="bookmark"
>cacheFromUrl and cacheSizeGB</a>,
because ERDDAP often has to download the file before it can read it.
<li>Datasets with data files stored on a high-bandwidth parallel file system,
because it can deliver more data, faster, when requested. Examples of
parallel file systems include JBOD, pNFS, GlusterFS,
Amazon S3, and Google Cloud Storage.
</ul>
Warning: When using nThreads&gt;1, keep an eye on ERDDAP's memory use, thread use, and overall responsiveness
(see <a rel="help" href="https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#statusPage"
>ERDDAP's status page</a>).
See the
<a href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#nThreads" rel="bookmark">nThreads</a> documentation for details.
<a href="https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#nThreads" rel="bookmark">nThreads</a>
documentation for details.
Thanks to Rob Bochenek of Axiom Data Science, Eugene Burger,
Conor Delaney (when he was at Amazon Web Services), and Google Cloud Platform.
<br>&nbsp;

<li>NEW standardizeWhat for all EDDTableFromFiles subclasses -
<br>Previously, if for a given variable, the values of the important attributes (e.g.,
Expand Down Expand Up @@ -471,13 +502,16 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
<li>IMPROVED -- The system for automatically converting non-ISO 8601 times
into ISO 8601 times (introduced in v1.82) has been greatly expanded
to deal with a large number of additional formats.
This affects GenerateDatasetsXml and ERDDAP's handling of source metadata.
<br>&nbsp;

<li>IMPROVED -- With its third major revision of the String time parsing system
(and hopefully the last),
ERDDAP no longer uses Java's DateTimeFormatter because of bugs
which sometimes affect extreme times (years &lt;=0000).
ERDDAP now uses its own system for parsing time strings.
<br>&nbsp;

<li>WARNING: The new String time parsing system is somewhat stricter.
If one of your datasets suddenly has only missing values for time values,
the cause is almost certainly that the time format string is slightly wrong.
Expand All @@ -487,22 +521,24 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
which "Convert[s] any common string time into an ISO 8601 string time" --
it indicates the format that the converter used to parse the source string.
<br>&nbsp;

<li>RECOMMENDATION: The quickest, easiest, and cheapest way to speed up ERDDAP's
access to tabular data is to put the data files on a Solid State Drive (SSD).
Most tabular datasets are relatively small, so a 1 or 2 TB SSD is probably
sufficient to hold all of the data files for all of your tabular datasets.
SSD's eventually wear out if you write data to a cell, delete it,
and write new data to that cell too many times.
Instead, I recommend that you just use your SSD to write the data once
Instead, I recommend that (as much as possible) you just use your SSD to write the data once
and read it many times. Then,
even a consumer-grade SSD should last a very long time, probably much longer
than any Hard Disk Drive (HDD).
Consumer-grade SSD's are now cheap (in 2018, ~$200 for 1 TB or ~$400 for 2 TB)
and prices are still falling fast.
When ERDDAP accesses a data file, an SSD offers both
<br>shorter latency (~0.1ms, versus ~3ms for an HDD, versus ~10(?)ms for a RAID, versus ~55ms for Amazon S3)
<br>and higher throughput (~500 MB/S, versus ~75 MB/s for an HDD versus ~500 MB/s for a RAID).
<br>So you can get up to a ~10X performance boost (vs a HDD) for $200!
<ul>
<li>shorter latency (~0.1ms, versus ~3ms for an HDD, versus ~10(?)ms for a RAID, versus ~55ms for Amazon S3), and
<li>higher throughput (~500 MB/S, versus ~75 MB/s for an HDD versus ~500 MB/s for a RAID).
</ul>So you can get up to a ~10X performance boost (vs a HDD) for $200!
Compared to most other possible changes to your system
(a new server for $10,000? a new RAID for $35,000? a new network switch for $5,000? etc.),
this is by far the best Return On Investment (ROI).
Expand All @@ -514,7 +550,7 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
<li>NEW: Everyone who is logged in gets role=[anyoneLoggedIn], even if there
is no &lt;user&gt; tag for them in datasets.xml.
If you set dataset's &lt;accessibleTo&gt; to <kbd>[anyoneLoggedIn]</kbd>,
then anyone who has logged in to ERDDAP (e.g., via their Gmail account)
then anyone who has logged in to ERDDAP (e.g., via their Gmail or Orcid account)
will be authorized to access the dataset,
even if you haven't specified a &lt;user&gt; tag for them in datasets.xml.
Thanks to Maurice Libes.
Expand Down Expand Up @@ -679,6 +715,10 @@ <h2><a class="selfLink" id="changes2.00" href="#changes2.00" rel="bookmark">Chan
<li>BUG FIX: Two bugs related to EDDTableCopy.
Thanks to Sam McClatchie.
<br>&nbsp;
<li>CHANGE: The number of failed requests shown on the status.html page
will increase because more things
are counted as failures than before.
<br>&nbsp;
<li>CHANGE: ERDDAP's status.html now shows "Requests (median times in ms)" in the time series.
Previously, it showed median times truncated to integer seconds.
<br>&nbsp;
Expand Down
Loading

0 comments on commit 5bcc7d1

Please sign in to comment.