Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Rework the shim layer to robustly handle ABI and API incompatibilities across Spark releases #3223

Closed
20 of 21 tasks
gerashegalov opened this issue Aug 12, 2021 · 5 comments · Fixed by #3365
Closed
20 of 21 tasks
Assignees
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher Spark 3.2+

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Aug 12, 2021

Is your feature request related to a problem? Please describe.

Apache Spark uses semantic versioning major.feature.patch, e.g. 3.1.2.

  • Most compatibility promises are moot for the Plugin because our integration points are in the developer API and internal API land, not the Public APIs this language pertains to.
  • Even if the compatibility guarantees were applicable we need to deal with breaking changes because we support multiple feature release lines 3.0.1+, 3.1.1+, and the future 3.2.x. In practice, incompatibilities at the API and ABI level may occur much more often in internal APIs.
  • Per documentation and simple bytecode comparison, even if there is compatibility, there is often no ABI compatibility, just API (source) compatibility requiring recompilation. Thus our current implementation relying on the version-independent compilation using 3.0.1 as a dependency runs into runtime failures with the other Spark versions that is only detected with tests.

Describe the solution you'd like
A solution that catches 99% of the issues at build time and is robust at run time.

Describe alternatives you've considered
Rely on the existing solution that is based on test coverage to identify the issues at run time

Additional context
Prototype branches such as

Tasks:

These are optional or nice to have tasks:

  • Integration tests should only pick up one jar, also would need update builds for individual spark jars
  • Build scripts to build all easily
  • external facing api's into its own module (optional)
@gerashegalov gerashegalov added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 12, 2021
@gerashegalov gerashegalov added this to the Aug 16 - Aug 27 milestone Aug 12, 2021
@gerashegalov gerashegalov added Spark 3.1+ Bugs only related to Spark 3.1 or higher Spark 3.2+ and removed ? - Needs Triage Need team to review and classify labels Aug 12, 2021
gerashegalov added a commit that referenced this issue Sep 2, 2021
Signed-off-by: Gera Shegalov gera@apache.org

This PR contributes to #3223

- enables spark 3.2.0 build via -Dbuildver=320
- prepares deduplication of identical classes once all shims follow spark 3.2.0 pattern
- sets stage for the parallel world classloader
@gerashegalov gerashegalov linked a pull request Sep 2, 2021 that will close this issue
@tgravescs
Copy link
Collaborator

reopen because has other subtasks

@abellina
Copy link
Collaborator

Should we add a subtask about testing with UCX for each Spark distro?

@gerashegalov
Copy link
Collaborator Author

@abellina sure, go ahead re: testing with UCX for each Spark distro. Also we have the optimization / fix you suggested offline about triggering the shuffle manager init on the driver earlier that you can add

@tgravescs
Copy link
Collaborator

tgravescs commented Sep 10, 2021

udf examples native build is passing so marking that off

@sameerz sameerz added the epic Issue that encompasses a significant feature or body of work label Sep 27, 2021
@tgravescs
Copy link
Collaborator

all required tasks are complete so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher Spark 3.2+
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants