Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTree release blog post #708

Closed
TomNicholas opened this issue Sep 9, 2024 · 1 comment · Fixed by #736
Closed

DataTree release blog post #708

TomNicholas opened this issue Sep 9, 2024 · 1 comment · Fixed by #736
Labels
blog Blog post

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Sep 9, 2024

We should write a blog post advertising the (upcoming) release of xarray-datatree in xarray main. (We're getting very close! - see pydata/xarray#8572 (comment))

This doesn't block our release of datatree, we can publish this after quietly adding datatree into xarray main (might be better to have a pseudo-staged rollout anyway).

Content Ideas:

  • Motivate why users wanted a hierarchical structure (e.g. Feature Request: Hierarchical storage and processing in xarray pydata/xarray#4118 and Dataset groups pydata/xarray#1092)
  • Very brief explanation of the solution we have ended up with
    • Doesn't need to explain much about actually using datatree - that should be covered by pointing people to the docs.
  • Emphasise that this is a big deal
    • Arguably the single largest feature added to xarray in 10 years? (I think it is by LoC)
    • For a decade there have been 2+1 (public + private) xarray data structures, now there are 3+1. (DataArray, Dataset, DataTree, the semi-private one is Variable)
  • Mention the prototype in xarray-contrib/datatree repo
  • Story of development
    • Originally applied for CZI funding to do this but didn't get it
    • Prototyped by Tom in separate repository
    • Iterated there until it mostly solidified, people started using it quite a lot even though it was marked "experimental"
    • Sat there for ~2 years until NASA group came along
    • They were already using the experimental version but wanted (a) more guarantees of support and (b) more representation/integration of their staff with datatree project
    • Amazingly they already had permission to allocate developer time
    • Owen, Matt & Eni then worked on migrating datatree into xarray upstream, with supervision from Tom, Stephan, and Justus
    • Allowed us to reduce bus factor and sanity check approach
    • Also gave us a chance to make big change to design (especially coordinate inheritance)
    • Took a bit longer than anticipated but otherwise worked out quite well
    • Got 3 new xarray core developers now - so NASA has more explicit representation
    • Was a lot easier for xarray team not to have to write a proposal to get developer time
    • This approach could work again in future!
  • Implore people to try datatree out, but also to report bugs / suggestions as it's still being built up to its full potential.

I'm happy to write this post, unless anyone else particularly wants to.

cc the datatree migration team, i.e. @shoyer, @keewis, @owenlittlejohns , @eni-awowale , @flamingbear

also @briannapagan in case you want to add any perspective about telling the story of collaboration here

@TomNicholas
Copy link
Member Author

DataTree has been released in xarray v2024.10.0! So I'll start on a short blog post next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blog Blog post
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant