Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Python 3.12 #2072

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Upgrade to Python 3.12 #2072

wants to merge 23 commits into from

Conversation

max-muoto
Copy link

Describe your changes

Upgrade base image to 3.12

@max-muoto
Copy link
Author

Haven't tested this yet, but it would be nice if we could start looking at moving the base image to being 3.12.

.readthedocs.yaml Outdated Show resolved Hide resolved
@mathbunnyru
Copy link
Member

It seems we depend on this PR: conda-forge/pycurl-feedstock#27

@mathbunnyru mathbunnyru marked this pull request as draft January 7, 2024 09:39
@mathbunnyru
Copy link
Member

Waiting for the upstream packages to be fixed.
@max-muoto if you want to move our docker stacks to Python 3.12 faster, please, consider helping conda-forge contributors by sending PRs to update packages.

I marked this as a draft and subscribed to the relevant PR, will re-run, when the issue resolves.

@consideRatio
Copy link
Collaborator

Issue for pycurl resolved upstream!

@mathbunnyru
Copy link
Member

@consideRatio thanks!

@max-muoto could you please resolve conflicts? Unfortunately, GitHub is not able to do it in the web browser.

@mathbunnyru
Copy link
Member

Something is wrong, as the change should have stayed small, just a few lines of code.

@max-muoto
Copy link
Author

Something is wrong, as the change should have stayed small, just a few lines of code.

Currently rebasing, should be fixed soon.

@max-muoto max-muoto force-pushed the python-3.12 branch 3 times, most recently from 2455425 to 9035f65 Compare March 10, 2024 17:06
@max-muoto max-muoto marked this pull request as ready for review March 10, 2024 17:06
@max-muoto
Copy link
Author

Something is wrong, as the change should have stayed small, just a few lines of code.

Should be good to take a look at now.

@max-muoto
Copy link
Author

Seems we also need to upgrade Pandas, just went ahead and did that.

@@ -63,7 +63,7 @@ USER ${NB_UID}
RUN mamba install --yes \
'grpcio-status' \
'grpcio' \
'pandas=2.0.3' \
'pandas=2.2.1' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the comment above to make sure you’re using the proper version?

Copy link
Author

@max-muoto max-muoto Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have pandas<=2.2.1 so we should be good here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check the latest stable tag, not the current main branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry there. Looks like we'll need for their next release, as this commit isn't included in the latest stable tag: ericm-db/spark@98ca3ea

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to wait on this as well, since we need at least Pandas 2.1.1 to ensure compatibility with 3.12.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjornjorgensen could you please tell us when the Spark release will include this commit? (at least approximately)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. well we are waiting for hadoop 3.4.0 and a new hive release. We haven't started any RC release yet. I build and test my own jupyterlab https://github.com/bjornjorgensen/jupyter-spark-master-docker and I did try python 3.12 but it breake so match so I'm using python 3.11 as debian testing are using.
And Spark 3.5.1 don't support python 3.12 have a look at apache/spark#43922

Copy link

@shreve shreve Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just revert this particular change rather than waiting for the next release of Spark? This seems like an incredibly self-imposed blocker.

Copy link

@flaviomartins flaviomartins Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have pinned version 2.2.2 apache/spark@7c639a1

@bjornjorgensen
Copy link
Contributor

python 3.10 was released Oct 04, 2021 we updated it #1711 May 28, 2022
python 3.11 was released Oct 24, 2022 we updated it #1844 May 31, 2023
I'm using Manjaro unstable and even here it's Python 3.11.8

@bjornjorgensen
Copy link
Contributor

[VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

Apache Spark 4.0.0 Release Plan

  1. After creating branch-3.5, set "4.0.0-SNAPSHOT" in master branch.

  2. Creating branch-4.0 on April 1st, 2024.

  3. Apache Spark 4.0.0 RC1 on May 1st, 2024.

  4. Apache Spark 4.0.0 Release in June, 2024.

@bjornjorgensen
Copy link
Contributor

there will be a test soon Re: [DISCUSS] Spark 4.0.0 release

@@ -63,7 +63,7 @@ USER ${NB_UID}
RUN mamba install --yes \
'grpcio-status' \
'grpcio' \
'pandas=2.0.3' \
'pandas=2.2.1' \
Copy link

@shreve shreve Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just revert this particular change rather than waiting for the next release of Spark? This seems like an incredibly self-imposed blocker.

@benz0li
Copy link
Contributor

benz0li commented Jun 11, 2024

@shreve You may be interested in b-data's/my JupyterLab Python docker stack.
ℹ️ There are both Python 3.12 and Python 3.11 images available.

@mj0nez
Copy link

mj0nez commented Aug 30, 2024

Hey, any updates on this? :)

@mathbunnyru
Copy link
Member

Hey, any updates on this? :)

Unfortunately, this is still blocked - we're waiting for a new spark release.
@bjornjorgensen any updates here? :)

@bjornjorgensen
Copy link
Contributor

@mathbunnyru
Copy link
Member

https://lists.apache.org/thread/ygtg445xg1pk8yw5brrv206y7lm8dxwj

Cool, thanks!
Hope it won't be long till spark 4.0 is finally released.

@flaviomartins
Copy link

@benz0li
Copy link
Contributor

benz0li commented Oct 15, 2024

@shreve You may be interested in b-data's/my JupyterLab Python docker stack.
ℹ️ There are both Python 3.12 and Python 3.11 images available.

And now also Python 3.13.0 images – without numba, though.

@bjornjorgensen
Copy link
Contributor

[DISCUSS] Creating branch-4.0 and Feature Freeze for Apache Spark 4.0

Spark 4.0 will be out in 2025

@mathbunnyru
Copy link
Member

mathbunnyru commented Oct 15, 2024

I think we have waited enough for a new Spark release and we should discuss switching to Python 3.12 earlier than "everything is 100% ready".

Reasons:

  • Python 3.13 has just been released, and it's over a year since the 3.12 release
  • There has been an increasing demand for images with newer Python
  • We are losing users who would like to use Python 3.12. People, who want to use Spark at the same time are free to use slightly old images with Python 3.11, so it's not totally breaking for them

@mathbunnyru @consideRatio @yuvipanda @manics let's vote on this one.
Please, vote by October 22nd.
If some option has 3 votes before the deadline, then I will accept it, if not, we'll go with the one that has the most votes (it should have at least 2 votes though).

I think there are several solutions, and you can choose several of them (please, select at least one).
Vote with 1️⃣, 2️⃣ and 3️⃣ in the comments.

1️⃣ Wait for Spark 4.0 then switch to Python 3.12
2️⃣ Switch now to Python 3.12, but keep Spark 3.5 (with outdated pandas, probably). I have no idea how much it is breaking for Spark users)
3️⃣ Switch now to Python 3.12 and Spark 4 previews (with quite modern pandas). This is breaking for users who don't use tagged images (and it's fine to break these users from time to time)

@mathbunnyru
Copy link
Member

I'm voting for 3️⃣

I think with such a long release process it's ok to switch to preview versions.
I don't want to lose users because we're not using Python 3.12, so it's either 2️⃣ or 3️⃣.
I prefer 3️⃣ because new Spark builds probably work better with Python 3.12 (but I can't be 100% sure, since I haven't used Spark at all).
People who want to have Spark 3.5 are still gonna have previously built images.

@manics
Copy link
Contributor

manics commented Oct 15, 2024

I agree with 3️⃣. I don't think it's a policy we should generally adopt, rather this is an exception due to how long it's taking to get a working combination. We can highlight the compromises made in the CHANGELOG

@bjornjorgensen
Copy link
Contributor

bjornjorgensen commented Oct 15, 2024

Spark 4.0 preview2 does support python3.12 and now there are support for python3.13 in master branch.
I vote for 3. But make sure that users understand that it is a preview, not the real 4.0

I hope and guess that there will be more preview's before the 4.0 final.

Spark 4.0 preview2 have support for pandas 2.2.2 apache/spark#46009
and for the next preview there will be support for pandas 2.2.3 apache/spark#48269

@mathbunnyru
Copy link
Member

I pushed a commit which makes Spark scripts more robust, this change is non-breaking and keeps everything the same, but the scripts will work better with newer spark and preview versions as well.
c4cb04e

@mathbunnyru
Copy link
Member

mathbunnyru commented Oct 20, 2024

So, the release schedule:

  • Monday, October 21st: automatic build, Python 3.11, Spark v3
  • Tuesday, October 22nd: Start using spark4-preview versions #2159 will be merged, Python 3.11, Spark v4 preview
  • Wednesday, October 23rd: Upgrade to Python 3.12 #2072 will be merged, Python 3.12, Spark v4 preview. When I merge the previous PR to the main branch, I will merge latest changes here and update the changelog - this PR will become really simple one.

It's better to merge 2 PRs separately (and on different days) - this might be helpful if something goes wrong.
And October 22nd is the perfect day to merge because it will be exactly a week since I started the poll.

@mathbunnyru
Copy link
Member

All looks great so far - I merged the Spark v4 preview to main, merged changes from main to here, and updated the changelog and the list of old images. This PR now looks really simple (as it should be in most cases). I will merge this PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants