Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#398] update Delta Lake dependencies to 3.2.1 #412

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

ewilkins-csi
Copy link
Contributor

Version 3.2.1 added support for Spark 3.5.3. Note the name of the delta-core artifacts changed to delta-spark.

@ewilkins-csi ewilkins-csi linked an issue Oct 11, 2024 that may be closed by this pull request
8 tasks
@@ -115,20 +114,6 @@ public void generate(GenerationContext context) {
vc.put("versionNeo4j", config.getNeo4jVersion());
vc.put("versionAwsSdkBundle", config.getAwsSdkBundleVersion());

// Addresses a Spark bug in which the --packages argument was not respected in cluster mode (3.0.0 <= x < 3.4.0)
// We don't currently support Spark <3.x.x in aiSSEMBLE, but we include a check for it just in case we ever do.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: We don't support changing the Spark version from the default aiSSEMBLE version (currently 3.5.2). It's not really worth it to keep the complexity around supporting older versions since we don't allow it. If we revisit this due to demand, the logic isn't too bad to re-introduce. (And would likely be a very small portion of the overall work that would need to be done to support multiple Spark versions.)

@@ -9,34 +9,6 @@ sparkApp:
#end
mainApplicationFile: "local:///opt/spark/jobs/pipelines/${mainApplicationFile}"
deps:
#if (!${usePackageDeps})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: Because we pulled out the logic for supporting older versions of Spark, usePackageDeps can be assumed to be true.

Scenario: POM dependencies are migrated
Given a POM that references the delta-core package
When the 1.10.0 DeltaSpark POM migration executes
Then delta-core is updated to delta-spark and any version numbers are bumped up to 3.2.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A:

Suggested change
Then delta-core is updated to delta-spark and any version numbers are bumped up to 3.2.1
Then delta-core is updated to delta-spark and the version is set to the build-parent property version.delta

It greatly simplified the logic to just always set the version to ${version.delta} and should be safe as any project running these baton migrations would be inheriting from build-parent.

return dep.getGroupId().equals("io.delta") && dep.getArtifactId().startsWith("delta-core_");
}

private static PomModifications.Replacement replaceInTag(InputLocationTracker container, String tag, String contents) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: Good candidate to bake into the PomModifications class in Baton

Copy link
Contributor

@csun-cpointe csun-cpointe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (yaml.hasObject("spec")) {
yaml = yaml.getObject("spec");
if (yaml.hasObject("deps")) {
yaml = yaml.getObject("deps");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create an utility function to get the object from a path. e.g.: YamlUtils.getObjectFromPath("sparkApp.spec.deps") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the YamlObject hasX/getX methods to accept full paths instead of single keys. Prevents this nesting situation while not cluttering the YamlObject API.

- com.amazonaws:aws-java-sdk-bundle:1.12.262
- org.neo4j:neo4j-connector-apache-spark_2.12:4.1.5_for_spark_3
- org.elasticsearch:elasticsearch-spark-30_2.12:8.9.0
- io.delta:delta-core_2.12:2.4.0 #comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we tested with comment 👍

@ewilkins-csi ewilkins-csi force-pushed the 398-j17-pipeline-persistence branch from 45fce50 to 40177b8 Compare October 11, 2024 19:03
return parent.containsKey(key) && type.isInstance(parent.get(key));
}

private <T> T getValue(String... path) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: An argument could be made that we only need these two type-parameterized methods and the hasObject/getObject methods, but it can be difficult to understand how to work with an API that only has type-parameterized methods.

throw new NoSuchElementException("Parent object does not exist at " + Arrays.toString(path));
}
String key = path[path.length - 1];
return (T) parent.get(key);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: If we passed in a Class object for the type, we could provide a more helpful message when the type of the value doesn't match what's expected. E.g.:

    if (!type.isInstance(parent.get(key)) {
        throw new IllegalArgumentException("Value for [" + key + "] is not of type " + type.getSimpleName());
    }

However, I guessed that in nearly every case the call site will contain the literal path to the value being accessed and so a couple extra steps to check the source code to find what value isn't of the expected type isn't that big of a deal and simplifies both the call site and the implementation of this method.

@ewilkins-csi ewilkins-csi merged commit a6e2043 into dev Oct 11, 2024
@ewilkins-csi ewilkins-csi deleted the 398-j17-pipeline-persistence branch October 11, 2024 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Java 17 Upgrade Support Series - Pipeline persistence
2 participants