Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-30064] Move existing Hive connector code from Flink repo to dedicated Hive repo #5

Merged
merged 746 commits into from
Dec 8, 2023

Conversation

snuyanzin
Copy link
Contributor

Copy link

boring-cyborg bot commented Nov 16, 2023

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

@@ -25,4 +25,4 @@ jobs:
compile_and_test:
uses: apache/flink-connector-shared-utils/.github/workflows/ci.yml@ci_utils
with:
flink_version: 1.16.0
flink_version: 1.19-SNAPSHOT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the externalization based on the code from the release-1.18 branch or master?

Copy link
Contributor Author

@snuyanzin snuyanzin Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial commit is based on master however let's see if there could be done support for 1.18.0
there are several things which are required to be workaround like FLINK-32620, FLINK-25593 and some others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, with last 2 commits 1.18 is supported.

there could be 2 ways: support it for 1.18.0 first only and then apply similar approach to adapt updates from 1.19
or support it for both 1.19 and 1.18 like here

@snuyanzin
Copy link
Contributor Author

snuyanzin commented Nov 25, 2023

rebased since there were new commits in main flink master to hive-connector area

Copy link
Contributor

@RyanSkraba RyanSkraba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Sergey -- sorry this took so long to review! In principle, it looks pretty good, but I have a question about the migration of test utilities from

  • flink: ./flink-end-to-end-tests/flink-end-to-end-tests-common/src/main to
  • flink-connector-hive: /flink-connector-hive-e2e-tests/src/main/
  1. The migrated package is org.apache.flink.tests.util.flink but other connectors use the form org.apache.link.tests.util.kafka for example.
  2. Just for info, are these utilities unique to the flink Hive connector?

There's some weirdness going on with my local build, where I can't seem to get the docker images in the TestContainers to run, but it's hard to argue with the CI passing! I'm investigating the reason locally.

Thanks for this work!!
These files are necessary for the flink end-to-end test

@luoyuxia
Copy link
Contributor

luoyuxia commented Dec 2, 2023

Cool! @snuyanzin Thanks for the pr. Sorry for the late response for I'm busy with other things. I'll definitely have a look in next week.

@snuyanzin
Copy link
Contributor Author

snuyanzin commented Dec 3, 2023

rebased since there were new commits in main flink master to hive-connector area

also added archunit violations because of recent commit within FLINK-33637
@RyanSkraba moved test packages to flink.connectors.hive

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snuyanzin Thanks for the great work. Overall look good to me assuming ci test pass for Hive2 & Hive3.

@luoyuxia
Copy link
Contributor

luoyuxia commented Dec 6, 2023

@snuyanzin Hi, I also found some pages(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hive-dialect/overview/ , https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/hiveserver2/) in Flink still refer to Hive with internal link. Will we adjust them in the pr of removing Hive connector from flink repo?

@snuyanzin
Copy link
Contributor Author

snuyanzin commented Dec 6, 2023

Thanks for the feedback

Will we adjust them in the pr of removing Hive connector from flink repo?

yep, usually it's happening either with while removal of connector from main repo or together with the first oficial release of externalised connector.

Copy link
Contributor

@MartijnVisser MartijnVisser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snuyanzin Thanks for the update. LGTM. Btw, will we also enable test with hive3 nightly just like flink does? I want to make sure we can have a chance to test & verify it with hive3.

@MartijnVisser
Copy link
Contributor

I want to make sure we can have a chance to test & verify it with hive3.

Good point. I actually think we should consider running them both for each PR, that should be easier with Hive being externalized.

@snuyanzin
Copy link
Contributor Author

snuyanzin commented Dec 7, 2023

agree, makes sense
will try do that
however currently this is no supported by flink-connector-shared-utils ...

the only solution is to copy paste ci.yml and enable running tests against different hive versions

I will see if it could be supported on a generic way since same functionality would be nice to have for OpenSearch connector

@MartijnVisser
Copy link
Contributor

I will see if it could be supported on a generic way since same functionality would be nice to have for OpenSearch connector

We could consider adding optional_maven_profiles as a generic variable and pass that through (kind of like https://github.com/apache/flink-connector-shared-utils/blob/ci_utils/.github/workflows/ci.yml#L87-L89)

@snuyanzin
Copy link
Contributor Author

yep, that exactly the idea i'm testing right now =)
thanks for confirmation 👍

@snuyanzin
Copy link
Contributor Author

just another note, I noticed that hive3 is tested only for jdk8 in flink nightlies
I tried to run it with jdk 11 and e2e failed while for hive2 it is passing
Since it was not run with jdk11 i will add support only for jdk8 for hive3 as it is done in flink main repo
other jdk support could be done separately

@snuyanzin
Copy link
Contributor Author

snuyanzin commented Dec 7, 2023

Currently it works for hive2 and hive3 (without e2e for jdk11)
as

hive version flink 1.18 + jdk 8 flink 1.19-snapshot + jdk8 flink 1.18 + jdk 11 flink 1.19-snapshot + jdk11
2
3 no E2E no E2E

as mentioned above under flink main there is no E2E tests for jdk11 as well

The PR currently copies ci.yaml from flink-connector-shared-utils ...

at the same time there are 2 PRs for that repo to support customization of jdk version apache/flink-connector-shared-utils#24 and maven profiles apache/flink-connector-shared-utils#29 , these 2 features are used here. Once they are merged and a new release released ci.yml could be reused and the copy could be removed

@luoyuxia , @MartijnVisser please let me know whether you are ok with this or not

@luoyuxia
Copy link
Contributor

luoyuxia commented Dec 8, 2023

@snuyanzin Cool! Thanks for your work. It sounds good to me.

@MartijnVisser
Copy link
Contributor

please let me know whether you are ok with this or not

I'm reviewing the CI PRs; when they are merged and we can merge this, will we remove the Hive code from Flink's master branch?

@snuyanzin
Copy link
Contributor Author

snuyanzin commented Dec 8, 2023

I'm reviewing the CI PRs; when they are merged and we can merge this

thanks a lot for your review

will we remove the Hive code from Flink's master branch?

yes, that was the plan, the only question is whether we need a dedicated ML thread for that or since there is already existing jira issue for Hive externalisation, we could just continue with it?

Also I would guess we need to release this connector

JingGe and others added 20 commits December 8, 2023 15:59
…iguration parameters) to RichFunction#open(OpenContext openContext)

This closes #23058
…e push down contains none-existent partition (#23423)
…ban Junit4 for table-planner module

This closes #23791.
@snuyanzin snuyanzin force-pushed the flink30064 branch 2 times, most recently from 5be7963 to 124d9cf Compare December 8, 2023 18:07
@snuyanzin
Copy link
Contributor Author

After merging of mentioned PRs to flink-connector-shared-utils all related WAs are removed and ci_utils is reused.

Also there are a couple of commits to hive connector in main repo, one of which leaded to conflicts. Now they are integrated here as well

The PR also enables manual trigger of nightly however it will be possible to trigger this only after merge to main
I will trigger it on my own fork

I'm going to merge it soon after the build is green and nightly on my fork is also green

@snuyanzin
Copy link
Contributor Author

@snuyanzin snuyanzin merged commit 557a369 into apache:main Dec 8, 2023
9 checks passed
Copy link

boring-cyborg bot commented Dec 8, 2023

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.