Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: Split EWAH bool array code #2711

Merged
merged 2 commits into from
May 15, 2023

Conversation

themousepotato
Copy link
Contributor

PR Summary

This PR removes the code of EWAH bool array from the yt repository. As suggested by @matthewturk, it is being split out as a new python package.

PR Checklist

  • Code passes flake8 checker
  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

@welcome
Copy link

welcome bot commented Jul 1, 2020

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

@matthewturk
Copy link
Member

This is still a priority, and I'm going to help get this moving forward.

Base automatically changed from master to main January 20, 2021 15:27
@neutrinoceros
Copy link
Member

Hey, I think that it will be a bit easier to move this forward with yt-project/ewah_bool_utils#1 + a new pypi publication :)

@neutrinoceros
Copy link
Member

hey @themousepotato, there are a couple conflicts here you'll need to fix. I assume that the reason they're popping is that the files you're removing were recently updated. You should be able to fix this by rebasing your branch and rerun your git rm calls.

Also, it seems pretty likely to conflict with #3413 too. I assume you guys have a plan set up for this but I'll admit it's not obvious from the outside in what order, and which PRs should go in.

@matthewturk
Copy link
Member

Ah, good point. I think what we should try doing is transplanting #3413 over into the ewah repo. Then, release a new version, and I will update #3413 to fix the issues (I assume that the issues will be fixable, but, cross your fingers). Then we merge 3413, we rebase this one, and it can go in.

@themousepotato
Copy link
Contributor Author

@matthewturk I've created yt-project/ewah_bool_utils#3 🤞

@neutrinoceros
Copy link
Member

Looks like everything that needed to happen first was completed, so do you guys want to revive this at any point ?

@matthewturk
Copy link
Member

matthewturk commented Nov 18, 2021 via email

@neutrinoceros
Copy link
Member

Then maybe start with fixing the conflicts and removing the draft status when CI goes green :)

@neutrinoceros
Copy link
Member

@matthewturk An issue I see with the current state of this (putting merge conflicts aside) is that no wheels are currently being distributed for the separated package, which means that if yt starts depending on it, it will add a noticeable compilation overhead at install time to everyone downstream.
I think we'd need to distribute wheels on PyPI and binaries on conda-forge for this to work. I can help with both.

@neutrinoceros
Copy link
Member

I just released ewah_bool_utils 1.0.0 https://pypi.org/project/ewah-bool-utils/#history
I will now revive this PR with a clean rebase

@neutrinoceros
Copy link
Member

compilation is currently broken, here are a sample of the errors we're getting:

'ewah_bool_array' is not a type identifier
'ewah_bool_array/ewah_bool_iterator.pxd' not found
'ewah_bool_array/ewah_map.pxd' not found
'ewah_bool_array/sstream.pxd' not found
...

Those seem to indicate some linking problem, so I assume the bug is on yt's side.

@neutrinoceros neutrinoceros added this to the 4.2.0 milestone Dec 22, 2022
@neutrinoceros neutrinoceros added infrastructure Related to CI, versioning, websites, organizational issues, etc refactor improve readability, maintainability, modularity labels Dec 22, 2022
@neutrinoceros
Copy link
Member

Solving issues as I go, I will need to cut at least one bugfix release for ewah-bool-utils. The remaining issues are currently tracked in https://github.com/yt-project/ewah_bool_utils/milestone/1

@neutrinoceros neutrinoceros marked this pull request as draft December 22, 2022 16:52
@neutrinoceros neutrinoceros changed the title [WIP] Split EWAH bool array code BLD: Split EWAH bool array code Dec 22, 2022
@neutrinoceros
Copy link
Member

neutrinoceros commented Dec 22, 2022

Seems like I pushed the latest patch just a little too early and some jobs failed because they couldn't find the new release on PyPI, which hit the server only seconds before. Otherwise it looks like v1.0.1 is viable, so I'll just relaunch failed jobs manually:
@yt-fido test this please

edit: actually there's an dependency conflict remaining

The conflict is caused by:
[69](https://github.com/yt-project/yt/actions/runs/3759612731/jobs/6389384062#step:4:70)
          ewah-bool-utils 1.0.1 depends on numpy>=1.17.5
[70](https://github.com/yt-project/yt/actions/runs/3759612731/jobs/6389384062#step:4:71)
          oldest-supported-numpy 2022.11.19 depends on numpy==1.17.3; python_version == "3.8" and platform_machine not in "arm64|aarch64|s390x|loongarch64" and platform_python_implementation != "PyPy"

@neutrinoceros
Copy link
Member

@yt-fido test this please

@neutrinoceros
Copy link
Member

Looking stable now, I'll simplify the history. Last thing to do before we can open this for review is packaging the stable version of ewah_bool_utils for conda-forge

@@ -6,12 +6,12 @@
from glob import glob

import numpy as np
from ewah_bool_utils.ewah_bool_wrap import BoolArrayCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems worth noting that this is the one and only import that makes ewah_bool_array a runtime dependency. If we could somehow avoid it, it would only be a dependency at build time, which would be much more robust to potential ABI compat breakage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is that true? I think that even though we're building the cython files, when we do the cimport it will be using dlopen and make it a runtime dependency as well.

@neutrinoceros
Copy link
Member

Trying to get ewah-bool-utils on conda-forge at conda-forge/staged-recipes#21592

@neutrinoceros
Copy link
Member

@matthewturk FTR I'd be happy to have you as the reviewer but if you'd rather ask someone who hasn't had a hand in the project instead I'd be okay with it too.

@matthewturk
Copy link
Member

I will review this!

@matthewturk
Copy link
Member

So I think that this is good to go if the tests pass.

I don't think we can make it optional without a considerable amount of refactoring, so having it mandatory seems like the correct move. Honestly, I'm not sure how to test this other than using it and relying on existing sets of tests.

@neutrinoceros
Copy link
Member

Honestly, I'm not sure how to test this other than using it and relying on existing sets of tests.

Agreed. I think it's sufficient.

@neutrinoceros
Copy link
Member

@matthewturk when do you think we can merge this ?

@matthewturk
Copy link
Member

Has anybody given a shot at a full-on installation yet?

@neutrinoceros
Copy link
Member

Not that I know of. The good news is that it's pretty easy to revert if anything too bad happens

@neutrinoceros
Copy link
Member

Is it a requirement ? I think we better be explicit about it.

@neutrinoceros
Copy link
Member

rebased to resolve a conflict

Co-authored-by: Navaneeth Suresh <navaneeths1998@gmail.com>
@neutrinoceros
Copy link
Member

(resolved another conflict)

@neutrinoceros
Copy link
Member

@yt-fido test this please

@neutrinoceros
Copy link
Member

@matthewturk I'd really like to get this in for yt 4.2

@matthewturk
Copy link
Member

Yes, that's reasonable. How should we support you?

@neutrinoceros
Copy link
Member

merging this ? AFAIC it's been ready for a couple month

@matthewturk
Copy link
Member

Oh, hm. Well, alright! I think we'll need to notify people -- almost every time we split out a dependency, there's a bit of a bumpy road as people don't have it installed and just pull from git etc.

@neutrinoceros
Copy link
Member

That's fair. Do you want to do it, or should I ?

@matthewturk
Copy link
Member

I'm OK with either. What do you think the relative credit/blame ratio is going to be? :)

@neutrinoceros
Copy link
Member

Depends how you weight it, but I'd say about 1.
I think you're in a better position to advertise this change given your technical familiarity with it. I helped with the surgery but I don't actually know this code very well.

@matthewturk matthewturk enabled auto-merge May 15, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Related to CI, versioning, websites, organizational issues, etc refactor improve readability, maintainability, modularity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants