Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan classpath for EDD subclasses/dataset types #206

Merged

Conversation

srstsavage
Copy link
Contributor

@srstsavage srstsavage commented Sep 26, 2024

Description

This change uses reflection, specifically the ClassGraph classpath scanner, to discover concrete EDD subclasses and associated class metadata (fromXml methods and Sax handlers for XML deseralization) at runtime. The scan is performed once and stored in a static map for use in XML deserialization methods.

In addition to eliminating the need to register various EDD implementations in the ERDDAP codebase in EDD.fromXml and HandlerFactory.getHandlerFor, this change also allows loading of third party EDD implementations at runtime.

The reflection based EDD class discovery is enabled by the ERDDAP setting useEddReflection. If true, the class map built using reflection is used to discover classes/methods for XML deserialization. If false, legacy hardcoded approaches are used.

In local testing classpath scanning took about 1 second. This scan occurs only once at startup. Scanning is confined to EDD's package (gov.noaa.pfel.erddap.dataset) for performance reasons; scanning all packages took around 12 seconds. For this reason, all EDD implementations need to belong to the gov.noaa.pfel.erddap.dataset package to be detected.

This adds a dependency on classgraph, which is very small (~560k).

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist before requesting a review

  • I have performed a self-review of my code
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@srstsavage srstsavage marked this pull request as draft September 26, 2024 06:04
@srstsavage srstsavage force-pushed the load-edd-classes-with-reflection branch from b170d3f to 12c1c03 Compare September 26, 2024 06:08
@srstsavage
Copy link
Contributor Author

Need to do some testing tomorrow to verify that third-party EDD class discovery actually works, but here's the code for review.

@srstsavage
Copy link
Contributor Author

Ah, I missed the SaxHandler entirely. Looking at that now...

@ChrisJohnNOAA
Copy link
Contributor

Could we use an annotation instead of looking for a static method? I think that would simplify the check for in the reflection logic. It would also give an indication in the dataset file that the class is referenced by reflection (it can be difficult to differentiate dead code and code only referenced by reflection and I'd like to have some indication the code is not dead).

Also I have a minor refactoring of the sax handlers in this pull that you might want to take a look at: #205

@srstsavage
Copy link
Contributor Author

Before seeing your message I've been working on using an annotation on the EDD classes to indicate their Sax handler, so that's good confirmation. Are you saying we should use annotations for the non-sax parser fromXml approach as well? Basically have a method level @FromXml annotation that gets set on each EDD class' appropriate fromXml method?

@ChrisJohnNOAA
Copy link
Contributor

Yeah, I'd prefer to indicate code that's only getting called through reflection with an annotation to make that explicit in the code.

@srstsavage srstsavage force-pushed the load-edd-classes-with-reflection branch 2 times, most recently from e672ee3 to 51421e8 Compare September 27, 2024 07:23
@srstsavage
Copy link
Contributor Author

Ok, I reimplemented this, adding support for both fromXml and State sax parser discovery using reflections/annotations.

// appropriate one?
Constructor<?> constructor = eddClassInfo.getSaxHandlerClass().get().getConstructors()[0];

// Constructors for handlers don't have uniform signatures so we have to do some investigation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoking the constructor on the sax handler/State class is complex because the various State subclasses/Handlers don't have a uniform constructor signature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly there's essentially 2 different constructors for the dataset handlers. Many take 3 parameters (saxHandler, datasetId, completeState) and some take a 4th parameter (SaxParsingContext). It could make sense to have a dataset constructor param object that has all 4 of those and have all datasets take that object in their constructor.

Copy link
Contributor Author

@srstsavage srstsavage Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are currently three constructor signature variations with five total possible arguments, and the first three are always consistent.

$ git show main:WEB-INF/classes/gov/noaa/pfel/erddap/handlers/HandlerFactory.java | grep -o "saxHandler, datasetID,[^)]*" | sort | uniq -c
     10 saxHandler, datasetID, completeState
      9 saxHandler, datasetID, completeState, context
      3 saxHandler, datasetID, completeState, datasetType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot the FromFilesHandler version. dataset type can be included in that param object.

@srstsavage
Copy link
Contributor Author

Also, I think it should be fairly straightforward to adapt this to #205, maybe just changing the usage of Class<State> to Class<BaseDatasetHandler> for clarity.

@srstsavage srstsavage marked this pull request as ready for review September 27, 2024 07:46
@srstsavage srstsavage marked this pull request as draft September 27, 2024 15:52
@ChrisJohnNOAA
Copy link
Contributor

ChrisJohnNOAA commented Sep 27, 2024

We should probably file an issue to update the GenerateDatasetsXml.java (https://github.com/ERDDAP/erddap/blob/main/WEB-INF/classes/gov/noaa/pfel/erddap/GenerateDatasetsXml.java#L187) with a similar reflection approach. I think that will be quite a bit more complicated due to the varying parameters though. Unless you wanted to tackle that here?

@srstsavage
Copy link
Contributor Author

Good call, let's handle GenerateDatasetsXml changes in a separate issue.

@srstsavage
Copy link
Contributor Author

srstsavage commented Sep 27, 2024

@ChrisJohnNOAA One thing I want to check on. With the next release are we going to set a hard requirement on Java 21+? Or continue to support both 17 and 21?

This page strongly suggests using Java 21 but mentions that both 17 and 21 are tested and supported.

https://github.com/ERDDAP/erddap/blob/main/DEPLOY_INSTALL.md

On the changes page there's no mention of Java 21. Java 17 is currently noted as being the required version there.

https://erddap.github.io/changes.html

I ask because some of the class casting used in the new reflections code works in 21 and fails in 17. If we want to continue to support 17 I think I can make the generics unspecific.

This change uses reflection, specifically the ClassGraph classpath
scanner, to discover concrete EDD subclasses and associated class
metadata (fromXml methods and Sax handlers for XML deseralization)
at runtime. The scan is performed once and stored
in a static map for use in XML deserialization methods.

In addition to eliminating the need to register various EDD implementations
in the ERDDAP codebase in EDD.fromXml and HandlerFactory.getHandlerFor,
this change also allows loading of third party EDD implementations at runtime.

The reflection based EDD class discovery is enabled by the ERDDAP setting
useEddReflection. If true, the class map built using reflection is used
to discover classes/methods for XML deserialization. If false, legacy
hardcoded approaches are used.

In local testing classpath scanning took about 1 second. This scan occurs
only once at startup. Scanning is confined to EDD's package
(gov.noaa.pfel.erddap.dataset) for performance reasons; scanning
all packages took around 12 seconds. For this reason, all EDD implementations
need to belong to the gov.noaa.pfel.erddap.dataset package to be detected.

This adds a dependency on classgraph, which is very small (~560k).
@srstsavage srstsavage force-pushed the load-edd-classes-with-reflection branch from 51421e8 to e4664e1 Compare September 29, 2024 06:04
@srstsavage srstsavage marked this pull request as ready for review September 29, 2024 06:49
@srstsavage
Copy link
Contributor Author

This seems to be working well. I made the reflection based discovery configurable with a useEddDiscovery setting, and the Java 17 issues mentioned above seem to be resolved.

<warSourceDirectory>${project.basedir}</warSourceDirectory>
<warSourceExcludes>**/*.java, **/*.javaINACTIVE, **/*.javaInactive, **/*.java_NOT_YET, **/*.javaNOT_FINISHED, **/*.javaOLD, content/**, data/**, development/**, download_cache/**, /src/test/**, target/**, src/**, WEB-INF/classes/gov/noaa/pfel/coastwatch/log.*, WEB-INF/lib/**, WEB-INF/NetCheck.*, WEB-INF/*.web.xml, WEB-INF/DoubleCenterGrids.sh, WEB-INF/FileVisitorDNLS.sh, WEB-INF/FindDuplicateTime.*, WEB-INF/GenerateO*, WEB-INF/GenerateT*, WEB-INF/GridSaveAs*, WEB-INF/incompleteMainCatalog.xml, WEB-INF/iobis.m, WEB-INF/obis.m, WEB-INF/MapViewer.*, WEB-INF/QN2005193*, WEB-INF/ValidateDataSetProperties*, WEB-INF/fonts/**, WEB-INF/temp/**, *.*, .settings/**, .github/**, .vscode/**</warSourceExcludes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a source jar to the build so projects importing classes can examine source without using decompilers.

@@ -180,10 +180,25 @@
<artifactId>maven-war-plugin</artifactId>
<version>3.4.0</version>
<configuration>
<attachClasses>true</attachClasses>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build a jar containing ERDDAP classes for use in downstream projects.

@ChrisJohnNOAA
Copy link
Contributor

@ChrisJohnNOAA One thing I want to check on. With the next release are we going to set a hard requirement on Java 21+? Or continue to support both 17 and 21?

This page strongly suggests using Java 21 but mentions that both 17 and 21 are tested and supported.

https://github.com/ERDDAP/erddap/blob/main/DEPLOY_INSTALL.md

On the changes page there's no mention of Java 21. Java 17 is currently noted as being the required version there.

https://erddap.github.io/changes.html

I ask because some of the class casting used in the new reflections code works in 21 and fails in 17. If we want to continue to support 17 I think I can make the generics unspecific.

I believe last release I upgraded to 21 shortly before the release was done so I had done a lot of testing with both 17 and 21. It'd be good if ERDDAP can continue to work on 17, however in my opinion the most important thing is to be compatible with the latest LTS version (currently 21 which has been available for about a year now).

@ChrisJohnNOAA ChrisJohnNOAA merged commit 1381745 into ERDDAP:main Oct 4, 2024
@srstsavage srstsavage deleted the load-edd-classes-with-reflection branch October 4, 2024 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants