Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge runners-core-construction into sdks-java-core #29924

Merged
merged 3 commits into from
Feb 15, 2024

Conversation

kennknowles
Copy link
Member

@kennknowles kennknowles commented Jan 4, 2024

Fixes #30169

This is a large-scale change to eliminate the toil of making Java transforms available as portable transforms. This brings the SDK in line with Python and Go, with the protobuf dependencies and core features being simply part of the main module.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

github-actions bot commented Jan 6, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@kennknowles
Copy link
Member Author

@chamikaramj @Abacn this one is now green, too. It includes #30235. But I think it may be better to merge one at a time to take it slow.

@kennknowles
Copy link
Member Author

The result of the classpath scanning on windows seems to be stable in red. I will try to investigate.

@kennknowles
Copy link
Member Author

It does not seem to upload a scan so I cannot get the details :-(

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

* limitations under the License.
*/

plugins { id 'org.apache.beam.module' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yey :)

":runners:core-java",
":runners:local-java",
":runners:java-fn-execution",
":sdks:java:extensions:avro"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avro dependency is not needed anymore ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not!

@@ -67,7 +66,6 @@ dependencies {
dependOnProjects.each {
implementation project(it)
}
shadow project(path: ":sdks:java:core", configuration: "shadow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Why did direct-java needed to depend on "core" before ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it depends on core above, in the dependOnProjects section.

@@ -24,6 +24,19 @@ applyJavaNature(
'FileIO': 'https://github.com/typetools/checker-framework/issues/6388',
'MergingActiveWindowSetTest': 'https://github.com/typetools/checker-framework/issues/3776',
'WindowFnTestUtils': 'https://github.com/typetools/checker-framework/issues/3776',
'WindowIntoTranslation': 'https://github.com/typetools/checker-framework/issues/3791',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typetools/checker-framework#3791 has been fixed.

Should we reference a different bug here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a checkerframework upgrade but didn't update this list. Seeing if these no longer crash now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I had to add suppression to each file but they don't crash checkerframework.

@@ -77,13 +90,16 @@ dependencies {
shadow project(path: ":model:pipeline", configuration: "shadow")
shadow project(path: ":model:fn-execution", configuration: "shadow")
shadow project(path: ":model:job-management", configuration: "shadow")
shadow project(path: ":model:fn-execution", configuration: "shadow")
shadow project(path: ":sdks:java:transform-service:launcher") // not "shadow" configuration, which is an uberjar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the "shadow" configuration for other projects referenced above ?

Note that after https://github.com/apache/beam/pull/30190/files#diff-c85538d3151bb7ac1847f9b888558c84a82705624f6fc453648aba1fa135c6d8 this is not an uber jar anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'll try it. We may not need the shadow configurations of the other projects either. It is confusing and changes based on whether the shadow plugin is used in a project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. There is no shadow configuration for the launcher. But for the other ones, the non-shadow configuration is an uber jar (or at least it was). I think the solution will be to start at the leaves and remove extraneous use of shadow plugin. A big project and not as urgent I think.

implementation project(path: ":sdks:java:core", configuration: "shadow")
implementation project(path: ":sdks:java:transform-service:launcher")
implementation project(path: ":sdks:java:transform-service:launcher") // not "shadow" configuration, which is an uberjar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto regarding the inconsistency when depending with other projects vs this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It turns out launcher doesn't have a shadow configuration anyhow. This is turned on/off by whether the shadow plugin is active. Ideally we stop using it, because vendoring is the only shading we need, really.

@Abacn
Copy link
Contributor

Abacn commented Feb 14, 2024

What did core-construction-java originally do? I see it moved to "org.apache.beam.sdk.util.construction", Is it a utility? Thinking about this module or "org.apache.beam.sdk.construction" looks better. As a similar module "org.apache.beam.sdk.expansion" just has its top level namespace after sdk.

@kennknowles
Copy link
Member Author

What did core-construction-java originally do? I see it moved to "org.apache.beam.sdk.util.construction", Is it a utility? Thinking about this module or "org.apache.beam.sdk.construction" looks better. As a similar module "org.apache.beam.sdk.expansion" just has its top level namespace after sdk.

@Abacn the util folder in the SDK is "not for users" and has its javadoc hidden. It is kind of a hacky way but it has a long history.

The history of core construction is a few different things:

  1. runners/core-java was created for execution helpers for runners, so they could all build their workers.
  2. later we created transform overrides that are for runners, but only before job submission and I think that is when runners/core-construction-java was created. Specifically, things like DataflowRunner were allowed to depend on core-construction-java but they were not allowed to depend on core-java. Only the worker is allowed to depend on core-java.
  3. The really big reason was to keep proto and gRPC dependencies out of the core SDK. We wanted the core SDK to be "pure" for each language, and portability would be an optional plugin/add-on. Now we have a totally different philosophy where portability comes first and SDKs come second, and everything depends on proto so much there is no point.

I realize now that actually I could have just added PTransform.toProto to the Java SDK without doing this merge 😅 and just deleted the translation from core-construction-java.

But to summarize my intention with all these changes, it is to make it much much easier to add portability stuff to every transform in the Java SDK, but just implementing PTransform.getURN() and PTransform.toProto() basically. Get rid of all the @AutoService stuff and all the interfaces involved there.

@kennknowles
Copy link
Member Author

green!

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation. Just had a comment

After these refactors, good to add an item in CHANGES.md

@@ -77,13 +77,16 @@ dependencies {
shadow project(path: ":model:pipeline", configuration: "shadow")
shadow project(path: ":model:fn-execution", configuration: "shadow")
shadow project(path: ":model:job-management", configuration: "shadow")
shadow project(path: ":model:fn-execution", configuration: "shadow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate ":model:fn-execution"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@kennknowles
Copy link
Member Author

Added CHANGES commit to the PR and I'll merge them together once green again. (no code changes but it is nice to see green in the history)

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kennknowles kennknowles merged commit 53c966d into apache:master Feb 15, 2024
38 checks passed
@kennknowles kennknowles deleted the runners-core-construction branch February 15, 2024 15:45
@@ -2748,11 +2746,9 @@ class BeamModulePlugin implements Plugin<Project> {
systemProperty "expansionPort", port
systemProperty "semiPersistDir", config.semiPersistDir
classpath = config.classpath + project.files(
project.project(":runners:core-construction-java").sourceSets.test.runtimeClasspath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need them, otherwise XVR javaUsingPython not run the tests, see #30353

@@ -67,6 +67,8 @@

* [Enrichment Transform](https://s.apache.org/enrichment-transform) along with GCP BigTable handler added to Python SDK ([#30001](https://github.com/apache/beam/pull/30001)).
* Allow writing clustered and not time partitioned BigQuery tables (Java) ([#30094](https://github.com/apache/beam/pull/30094)).
* Merged sdks/java/fn-execution and runners/core-construction-java into the main SDK. These artifacts were never meant for users, but noting
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately an unpleasant breaking change. We don't have much context why core-construction-java was never meant for users since we use some of the classes. This change makes it impossible for us to rollout to our users.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More concretely we have a customized implementation of PipelineResourcesDetector, among a few other references to classes in core-construction-java.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will appreciate at least this was mentioned in the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge runners-core-construction into core Java SDK
4 participants