Skip to content

Latest commit

 

History

History
362 lines (272 loc) · 16.7 KB

whats-new.md

File metadata and controls

362 lines (272 loc) · 16.7 KB

This is a major release featuring several optimizations, new features, and bug fixes. Nearly 400 pull requests were merged.

For the complete list of changes, see the changelog: https://github.com/gem/oq-engine/blob/engine-3.11/debian/changelog

Here are the highlights.

New features

The classical PSHA calculator has a brand new optimization called point source gridding, based on the idea of using a raw grid of point sources for distant sites and a fine grid for close sites. The feature is still experimental and not enabled by default, but the first results are very encouraging: up to 3× improvement in the runtimes of a few large continental models without reduction in the accuracy of the results. The point source gridding optimization is documented here:

https://docs.openquake.org/oq-engine/advanced/point-source-gridding.html

and you are invited to try it.

There is also a new syntax to perform sensitivity analysis, i.e. to run multiple calculations with different values of one (or more) parameters. This can be used to test the sensitivity to the parameters used in the point source gridding approximation, but in general it works for any global parameter. An example to assess the sensitivity to the integration distance is the following:

sensitivity_analysis = {'maximum_distance': [200, 300]}

Finally, now the engine can automatically download and run calculations from an URL containing a .zip archive. For instance

$ oq engine --run "https://github.com/gem/oq-engine/blob/engine-3.11/openquake/server/tests/data/classical.zip?raw=true"

Hazard calculators

The memory occupation of classical PSHA calculations has been reduced significantly; for instance, models that used to require over 100 GB of RAM on the master node now runs with less than 32 GB on the master node. We also optimized the calculation of the probability of exceedence by carefully generating arrays with a size smaller than the CPU cache (that can give a speedup of a factor 2 or 3).

A time-honored performance hack in event based calculations with full enumeration of the logic-tree has been finally removed: now the number of generated ruptures is consistent with the case of sampling of the logic-tree. This helps removing a source of confusion that was always present before. Event based calculations involving full-enumeration of the logic-tree are likely to require additional runtime now, but the calculations are manageable, whereas previously they might not have run to completion.

The scenario and event based calculators have been fully unified within the engine internals. As a consequence, the parameter controlling the rupture seeds is now always ses_seed. Before it was ses_seed in event based but random_seed in scenario, a potential source of confusion for the users since random_seed was also used for the logic-tree sampling. Please read the FAQ section on the seeds used by the engine for more details.

We changed the algorithm generating the rupture seeds, thus the engine will not produce the same GMFs as before, but they will still be statistically equivalent.

We refined the "minimum_intensity" approximation, by making it more precise: ground motion values below the specified threshold are replaced with zeros and not stored if and only if they are below the threshold for all intensity measure types.

We improved the task distribution in event based calculations, because sometimes it was producing too many tasks and sometimes sending the tasks to the workers was excessively slow.

We changed the task distribution also in scenario calculations: now we parallelize by number_of_ground_motion_fields if there are more than 10 sites. This improved a lot the performance in cases with many thousands of sites.

The scenario and event_based calculators (including ebrisk) now generate and store as a pandas-friendly dataset the (geometrically) averaged GMF on the events. This is useful for plotting and debugging purposes.

We reduced the data transfer due to the GMPEs (in particular the Kotha GMPEs): in some cases, this can make a huge difference (we saw a 10x reduction in the newest model for Europe) while for most models you may not see any sensible difference.

The preclassical calculator has been made faster and improved to determine the source weights more reliably, thus reducing the slow task issue in classical calculations. Now all sources are split and prefiltered with a KDTree in the preclassical phase.

We changed the semantics of the pointsource_distance approximation: before it was ignoring finite size effects, now it is just averaging them, so it is much more precise than before.

For calculations with a few sites now we store the classical ruptures in a single pandas-friendly dataset, including information about the generating sources.

We worked on improving the UCERF calculator, doing some minor optimizations, but a lot more could be done to improve its performance.

Risk calculators

The scenario_risk and event_based_risk calculator have been unified, as well as the scenario_damage and event_based_damage calculators, so do not worry if when running a scenario_risk calculation the progress log will say that you are running an event_based_risk calculation. The core calculation logic used for the calculations within the engine is the same now.

We added the ability to compute aggregate losses to the scenario calculators and aggregate loss curves to the event_based_risk calculator. Notice, however, that they are still less efficient than the ebrisk calculator, which should be the preferred calculator when attempting to compute aggregated loss curves.

We optimized the case of many tags so that now it is possible to aggregate by asset ID or by site ID by setting in the job.ini file

  aggregate_by = id  # compute loss curves for each asset
  aggregate_by = site_id  # compute loss curves aggregated by site

This works up to many thousands of assets/sites; previously it was simply impossible due to memory issues.

There was a huge speedup in large ebrisk calculations due to the removal of zero losses (we measured a 7x speedup in a calculation in test runs for NRCan).

The risk model (which comprises fragility functions, vulnerability functions, consequence models, and taxonomy mapping tables) is now stored in a pandas-friendly way; that improves by two orders of magnitude the saving time in calculations with many thousands of vulnerability/fragility functions.

The scenario_damage calculator is more efficient than before and it stores the damage distributions in a pandas-friendly way. It also stores a dataset avg_portfolio_damage useful for comparison purposes.

The CSV exporters have been updated to use pandas, thus improving the performance. Moreover various exporters have been changed in order to unify the aggregate losses outputs between ebrisk, event_based_risk and scenario_risk calculators. The most notable change is that the exporter for the loss curves aggregated by tag now also exports the total loss curve (in the same file). Here is an example:

https://github.com/gem/oq-engine/blob/engine-3.11/openquake/qa_tests_data/event_based_risk/case_6c/expected/agg_curves_eb.csv

Logic trees

We made the engine smarter in the presence of different sources with the same ID, which are unavoidable in presence of logic trees changing the source parameters. Now internally the engine uses an unique ID. For instance in the case of two different sources with ID "A", the engine will generate two IDs: "A;0" and "A;1". The information about the sources is now stored in a pandas-friendly dataset source_info with an unique index source_id.

Whe changed the internal storage of the PoEs in classical calculations to allow a substantial optimization of performance and memory occupation: this improvement is visible only in calculations with particularly complex logic trees. While at it, we fixed a bug in the sampling logic affecting some models in engine 3.10.

We changed the string representation of the realizations to make it more compact (before it was practically impossible to print out the full names of the realizations for some models, because the strings were too long).

We added a check on valid branch ID names: only letters, digits and the caracter "-", "_" an "." are accepted.

We added a new type of uncertainty for the seismic sources called TruncatedGRFromSlipAbsolute. That required adding a classmethod TruncatedGRMFD.from_slip_rate and to update the I/O routines to recognize the slip_rate parameter.

hazardlib/HTMK

We introduced a new MFD parameter slipRate and implemented a new GMPE AvgPoeGMPE performing averages on the probabilities of exceedence: this is an alternative approach to the AvgGMPE, which performs geometric averages on the GMFs.

The AvgGMPE, introduced over an year ago, has been extended to work also for scenario and event based calculations and has been documented here:

https://docs.openquake.org/oq-engine/advanced/mean-ground-motion-field.html

The ModifiableGMPE was enhanced with new methods set_scale_median_scalar, set_scale_median_vector, set_scale_total_sigma_scalar, set_scale_total_sigma_vector, set_fixed_total_sigma, set_total_std_as_tau_plus_delta, add_delta_std_to_total_std.

Richard Styron introduced a tapered Gutenberg-Richter MFD, closely following its implementation in the USGS NSHMP-HAZ code.

Marco Pagani introduced a new distance called 'closest_point' and a method to create a TruncatedGRMFD from a value of scalar seismic moment. Moreover he also introduced a KiteSurface class and and KiteFaultSource class, which at the moment are still considered experimental.

Viktor Polak contributed the GMPE Parker et al (2020). He also contributed the Hassani and Atkinson (2020) GMPE and added a new site parameter fpeak. Finally he contributed the GMPEs Chao et al. (2020) and Phung et al. (2020).

Laurentiu Danciu and Athanasios Papadopoulos contributed several intensity prediction equations for use in the Swiss National Earthquake Risk Model. The new IPEs refer to models obtained from the ECOS (2009), Faccioli and Cauzzi (2006), Bindi et al. (2011), and Baumont et al. (2018) studies. They also extended the ModifiableGMPE class to allow amplification of the intensity of the parent IPE based on a new amplfactor site parameter.

Graeme Weatherill made some updates to the GMPEs used in the European Seismic Hazard Model 2020 (ESHM20).

Claudia Mascandola contributed the GMPE Lanzano et al. (2019) and the NI15 regional GMPE by Lanzano et al. (2016).

We changed the SourceWriter not to save the area_source_discretization on each source when writing the XML files, otherwise the same parameter in the job.ini file would be ignored, which is normally undesirable.

Bugs

A regression entered in the classical_risk and classical_damage calculators in engine 3.10 causing an increase of the data transfer in hazard curves. That was killing the performance in the case of calculations with many thousands of sites. Fixed after a report by the EUCENTRE.

The exporters for the hazard maps and UHS were exporting zeros in the case of individual_curves = true. Fixed after the report by Jian Ma (https://groups.google.com/g/openquake-users/c/43flYFzOMoo/m/tpYFqv1pBAAJ).

In presence of an unknown parameter in the job.ini file - typically because of a mispelling - the log was disappearing; this has been fixed.

The boolean fields vs30measured and backarc were not cast correctly when read from a CSV field (the engine was reading the zeros as true values). Fixed after the report by Peter Pažák (https://groups.google.com/u/0/g/openquake-users/c/-8Abgea_Pu8/m/IHM0o68rDgAJ).

We fixed a wrong check raising a ValueError incorrectly in the case of multi-exposures with multiple cost types.

We fixed a bug in the calculation of average losses in scenario_risk computations: events with zero losses that were incorrectly discarded.

Now ignore_covs = 0 effectively sets all the coefficients of variation in the input vulnerability functions to zero, even when using the Beta distribution, which was not the case in previous versions.

New checks and warnings

We removed some annoying warnings in classical_damage calculations in the case of hazard curves with PoEs == 1.

The engine logs a warning in case of a suspiciously large seed dependency in event-based/scenario calculations.

The engine raises an early error if the parameter soil_intensities is set with an amplification method which is not "convolution".

The engine raises an early error in case of zero probabilities in the hypocenter distribution or the nodal plane distribution in the XML source files.

We added a check on the vulnerability functions with the Beta distribution: the mean loss ratios cannot contain zeros unless the corresponding coefficients of variation are zeros too.

Now we perform the disaggregation checks before starting the classical part of the calculation, so that the user gets an early error in case of wrong parameters.

The engine warns the user if it discover a situation with zero losses corresponding to non-zero GMFs.

We now accept vulnerability functions for taxonomies missing in the exposure: such functions are just ignored. This is useful since it means that a vulnerability model file prepared for a full exposure can be used on a reduced exposure missing some taxonomy strings.

oq commands

We replaced the command oq workers inspect with oq workers status.

We renamed oq recompute_losses as oq reaggregate and made it to work properly.

We enhanced the command oq compare and extended it also to the avg_gmf outputs.

We improved a fixed a few oq plot subcommands.

We enhanced oq plot sources to plot point sources and to manage the internationa date line.

We fixed a bug in oq prepare_site_model whensites.csvis the same as thevs30.csv` file and there is a grid spacing parameter.

The command oq nrml_to has been documented.

WebUI/WebAPI/QGIS plugin

If the authentication is off now the WebUI shows the calculations of all users and not only the calculations of the current user.

We improved the submission of calculations to the WebAPI: now they can be run on a full cluster, serialize_jobs is honored and the log level is configurable with a variable log_level in the file openquake.cfg.

We updated the QGIS plugin to reflect the changes in the engine outputs.

Other

The flag --reuse-hazard has been replaced by a flag -reuse-input that allows the user to reuse only source models and exposures. This is safer than trying to reuse the GMFs, which should be done with the --hc option instead.

The num_cores parameter has been moved from the job.ini file to the openquake.cfg file and now it works as expected.

There was a lot of work on secondary perils, both on the hazard and on the risk side, but the feature is still not ready for primetime.

Packaging

We now have a universal installer working on Linux, Windows and Mac (see https://github.com/gem/oq-engine/blob/engine-3.11/doc/installing/universal.md).

The universal installer is now the only supported way to install the engine on Mac and generic Linux systems. It works by using a pre-installed Python, which can be Python 3.6.x, 3.7.x. or 3.8.x. Python 3.9 is not supported yet; if you have an older Python (≤3.5) you must install a newer, supported version of Python and only then proceed with installing the engine.

For Debian-based systems the universal installer works just fine, but we also provide packages that include their own Python (version 3.8).

For RedHat-based systems we also provides packages that include their own Python (version 3.6). Notice that due to the change of policy of RedHat about the CentOS operating system, it is not clear if we will keep supporting it with the packages, but the universal installer will work.

We upgraded h5py to version 2.10 (for performance improvements) and shapely to version 1.7.1 (to unofficially support macOS Big Sur). Notice that macOS Big Sur is still not officially supported since we cannot reliably run tests for the engine on Big Sur, given that GitHub's Continuous Integration system does not support it yet. But we know of several users for whom the engine works on Big Sur, via the universal installer. The latest generation of MacBooks based on the Apple M1 CPU architecture, i.e., the MacBook Air (M1, 2020), Mac mini (M1, 2020), and the MacBook Pro (13-inch, M1, 2020) are not officially supported.