Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release of DataFusion: 7.0.0 #1587

Closed
alamb opened this issue Jan 16, 2022 · 34 comments
Closed

Release of DataFusion: 7.0.0 #1587

alamb opened this issue Jan 16, 2022 · 34 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 16, 2022

I wonder if it is time to release a new version of datafusion to crates.io?

It would be great to crowdsource:

  1. Update readme / changelog
  2. Update version
  3. (maybe) a blog post?

I am happy to handle creating a release candidate / doing the official voting process.

@alamb alamb added the enhancement New feature or request label Jan 16, 2022
@xudong963
Copy link
Member

Yeah, I think we can wait for arrow2 related tickets merged into master?
BTW, I can help write a blog!

@alamb
Copy link
Contributor Author

alamb commented Jan 16, 2022

arrow2 may be a good driver.

I don't have a good sense of how many projects use datafusion from crates.io (aka what has been released) vs how many use it via a github sha . IOx (my project) uses the sha but I realized that maybe others are waiting for an actual release

@alamb alamb changed the title Proposal: new release of DataFusion? Proposal: new release of DataFusion: 7.0.0 Feb 3, 2022
@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2022

There is all sorts of good stuff in DataFusion since we last made a release. Since arrow (C++) just released a 7.0.0 I was thinking to do the same with DataFusion (as @pauldix says the success of a project is predicated on 1. Making sweet sweet software and 2. telling people about it). We have done 1, and now we need to do a bit more of 2. ✍️

@matthewmturner
Copy link
Contributor

Perhaps we can start crowdsourcing points for a post / blog on a google doc. I've made one here (sry havent had chance to add anything yet but trying to help as i can):

https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing

@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2022

Thank you @matthewmturner

@houqp
Copy link
Member

houqp commented Feb 3, 2022

I won't have the time to drive the release this time, but happy to help fixing any issue with the existing release automation and guide anyone through the process.

@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2022

I can drive the release

@xudong963
Copy link
Member

I will finish this issue #1400 before release to give our users a clearer readme

@matthewmturner
Copy link
Contributor

@alamb FYI i went through and made some updates to the google doc.

Next up I will focus on performance improvements / new features. If you have anything in particular in mind you would like added could just mention here or on doc and ill add it / look up the relevant issue / PR to link?

@alamb
Copy link
Contributor Author

alamb commented Feb 8, 2022

Thanks @matthewmturner -- I'll try and give the doc another pass through later today

@matthewmturner
Copy link
Contributor

I went through and made more updates and added some git stats.

@alamb let me know if anything in particular I can do to help alleviate the burden on your side. I'm happy to provide more assistance on releases in general (this one and future).

@alamb
Copy link
Contributor Author

alamb commented Feb 8, 2022

Thanks you very much @matthewmturner

Things that I think we need to do prior to release:

  • Make a PR to update the version in all the crates to 7.0.0
  • Create a PR to create a CHANGELOG (I had good luck with the github changelog generator that @jorgecarleitao setup in arrow-rs once upon a time).
  • (nice to have): Ensure links in the docs are correct #1741
  • Take a pass through the docs site to refresh / improve it / make it "snazzy"

Making the PR to update the version would be sweet 👍

I think it is probably best if I take an initial swag at running the changelog generator thing as it requires the ability to mess with tickets titles / tags.

@alamb alamb changed the title Proposal: new release of DataFusion: 7.0.0 Release of DataFusion: 7.0.0 Feb 9, 2022
@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2022

Here is a proposed changelog. #1807

I guess it is probably time to cut a release branch. Does anyone know if we are waiting for anything else to be merged? @xudong963 @matthewmturner @Dandandan @houqp @Dandandan @jimexist @andygrove ?

@matthewmturner
Copy link
Contributor

@HaoYang670 did you get a chance to work on #1741?

@alamb its probably too late to have any impact for this release, but for my info, can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release? I figure any docs.rs update will have to be linked to a release.

Otherwise, okay for me.

@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2022

can we only do updates to the user guide / website (https://arrow.apache.org/datafusion/) when we have a release?

No, we can update https://arrow.apache.org/datafusion/ (the hosted version of https://github.com/apache/arrow-datafusion/tree/master/docs) any time we want 👍

@matthewmturner
Copy link
Contributor

@alamb great! thx

@xudong963
Copy link
Member

I guess it is probably time to cut a release branch. Does anyone know if we are waiting for anything else to be merged?

I take a quick look at our unmerged tickets, I think no. Thanks for your nice work! @alamb

@HaoYang670
Copy link
Contributor

HaoYang670 commented Feb 11, 2022 via email

@alamb
Copy link
Contributor Author

alamb commented Feb 11, 2022

I am going to do a dry run today of publishing datafusion to crates.io (will use an 0.1.xx version to test)

@alamb
Copy link
Contributor Author

alamb commented Feb 11, 2022

I tested publishing 7.0.0-alpha to crates.io using https://github.com/alamb/arrow-datafusion/tree/alamb/test_publish

then going into each crate like this:

cd datafusion-common && cargo publish

One thing I noticed is that datafusion-cli depends on ballista, and so without a ballista release we can't also do a datafusion-cli release 🤔 But maybe that is ok and we can do a datafusion-cli release later (or maybe even move datafusion cli into the contrib repo, or perhaps make the ballista dependency optional)

@matthewmturner
Copy link
Contributor

I had thought about moving datafusion-cli to datafusion-contrib as well. This does seem generally aligned with moving things out of the datafusion repo that arent actually datafusion (i.e. datafusion-python). It does also seem reasonable to make ballista optional.

@jimexist
Copy link
Member

Making ballista dependency optional seems okay.
Having a cli in repo helps a lot with debugging in my opinion

@matthewmturner
Copy link
Contributor

matthewmturner commented Feb 12, 2022

@jimexist That's a really good point - I have found it helpful as well. But couldn't we do something like the below in cargo.toml from the datafusion-cli repo to achieve a similar experience? Not quite as convenient but I think it's close.

[dependencies]
datafusion = { path = "../path/to/datafusion" }

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2022

Having a cli in repo helps a lot with debugging in my opinion

This is an excellent point @jimexist -- I use the cli extensively during debugging.

But couldn't we do something like the below in cargo.toml from the datafusion-cli repo to achieve a similar experience? Not quite as convenient but I think it's close.

I think we could @matthewmturner but I think that will effectively require people to have checked out the datafusion-cli repo any time they want to build datafusion; So if it is required to build datafusion, the benefits of putting it in a separate repo seem pretty small 🤔

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2022

Filed #1814 to see if we can solict some more help for the doc site

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2022

#1816 <-- for optional ballista feature in datafusion-cli

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2022

🤔 I don't think #1816 is sufficient to publish datafusion-cli to crates.io -- cargo still tries to resolve the dependencies of ballista (even though it is an optional dependency)

The only way I could get it to publish was to comment out the ballista dependency all together 8750db3 🤔

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2022

Update here: I would like to wait for the arrow 9.0.0 release to be published (later today or tomorrow) and then update datafusion to use it: #1775

Then I'll try and make a release candidate tomorrow or Monday

Its going to be great!

@alamb
Copy link
Contributor Author

alamb commented Feb 14, 2022

I am actively working to create a release candidate for datafusion 7.0.0

@alamb
Copy link
Contributor Author

alamb commented Feb 14, 2022

@matthewmturner would you be willing to start a PR for a blog post: https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing ?

The DataFusion 6.0.0 announcement is here for reference: apache/arrow-site#160

@matthewmturner
Copy link
Contributor

@matthewmturner would you be willing to start a PR for a blog post: https://docs.google.com/document/d/17uB1GIN58xOehQP5XpJH8J3qA7KDS1WQJzyHzW7L1kU/edit?usp=sharing ?

The DataFusion 6.0.0 announcement is here for reference: apache/arrow-site#160

Sure - will do now!

@alamb
Copy link
Contributor Author

alamb commented Feb 14, 2022

Official mailing list post with all the details is here: https://lists.apache.org/thread/t8381y8x1t452dvqr3y7h85q4dncvwrx

@xudong963
Copy link
Member

xudong963 commented Feb 14, 2022 via email

@alamb
Copy link
Contributor Author

alamb commented Feb 17, 2022

The release was approved and published 🎉

Mailing list thread is here: https://lists.apache.org/thread/hcpcf3shlt0l3fm2k313tq1tvrczlowf

The release is available here:
https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-7.0.0

I have also published it to crates.io here:
https://crates.io/crates/datafusion/7.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants