-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TPCH #1094
Update TPCH #1094
Conversation
(this will all be unnecessary when/if we get proper park working)
Some of these were very very old.
@milesgranger I put the Spark stuff in here (sorry for mixing PRs) Things work locally, but I'm getting the following when running remotely:
If you're still around and have some time can I ask you to try running? |
Should be just
|
You'll need to |
Oh, I'm maybe dumb. One sec. |
Yeah, I was dumb. Still not working, but that particular issue is gone. |
OK, things are working. I'll post results in the notebook PR shortly. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
To run benchmarks today (@phofl was curious)
|
Oh, and for Spark you'll need https://github.com/coiled/platform/pull/3530 |
OK, I did enough TPCH work here that I decided to unwind the larger structural changes of this PR and save that for another day (tomorrow?) Changes in this PR now include:
I think that this is ready for review. I'm hopeful that, if people are fine ignoring TPCH stuff, that it's easy to get in. |
@fjetter I would like to get this in soon. It is a large change to TPCH and likely to contain conflicts. Should I just merge? I suspect that we'll need migrations. Can I ask you or someone on your team to help review quickly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, for what it's worth. 👍
Maybe some others want to look? (@crusaderky @phofl)
Cool. I'm mostly curious if there is something that we have to do with database migrations, given that I've moved some of the tests around. |
My understanding is that can be a follow up, I can try to figure that one out. |
@milesgranger I trust you to follow up with the migrations. Let me know if you need help. #1099 (comment) suggests a starting point |
This moves the TPC-H benchmarks outside of the
tests/benchmarks
directory, and treats it as more of a standalone system.Then, this also rewrites some of the relevant
conftest.py
file in a way that, I think, makes it simpler to manipulate (that might be subjective though).Some notable changes:
local
andscale
fixtures that control datasets and should eventually control clusters that are usedHowever, I suspect that this approach screws with existing tests because they import things from conftest, which is now at root level. Things will still need to be moved around to make this work. Mostly right now I'm looking for if this makes sense to do, or if there are major reasons why not to do it.
To give more motivation, if TPC-H benchmarks work, people will want to come here and look at them. If they're highly intertwined with the existing benchmark machinery then I think that they will be hard to understand and run. I would like to make the tpc-h benchmarks more accessible to people who are unfamiliar with this repository.