-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Join keys of 2 arrays #1475
Comments
This is an interesting question! In the future, please submit any usage questions to StackOverflow's jq tag. First, we want to go through each asset:
Then, for each asset, we want to go through the descriptions, and find the matching one(s) for the asset; the argument passed to
Finally, we build our desired objects out of our
All together now:
|
A way to make it more readable would be to define a
You can then use this like:
|
This is exactly what I was looking for. |
There's a
|
@pkoppstein The |
Here are filters for performing relational joins that may be suitable for your "standard library" and/or builtin.jq. Suggestions for improvement are welcome! These filters emit streams and are therefore named "joins"; this name will hopefully also avoid confusion and/or conflict with the existing filter: With the OP data in data.json, the filter:
produces:
The same result is produced by:
joins/1, joins/2, joins/4, joins/6Notes:
[EDIT (12/13/2017): the core functionality is now provided by joins/6] |
These
which you can use like this:
To break this down:
|
Incidentally, I've a branch that adds varargs support to jq. That will allow us to have multi-column group- and sort-by functions, and multi-column indices that let |
@nicowilliams wrote:
And there's the main problem. As the OP mentioned, the output is supposed to have 4 results: two for "alpha" and one each for "beta" and "gamma". The other problem, to me at least, is the hairiness of the human interface to JOIN. My
|
@pkoppstein If you look at the data, and compute the desired join by hand, you don't get the example result given. The
EDIT: I assume the example desired result was just an example of the form of desired result, not the one that would necessarily result from the given example input data. |
@nicowilliams - With your JOIN and the following query (which corresponds with the one specified by the OP), the four "expected" results are obtained:
So at least we're on the same page now. But I would like to see a simpler interface ... p.s. You could change |
@pkoppstein No, I'll think about simplifying the interface. |
Yes, I opened it (#1508). I don't think it's relevant in this particular case. |
Well, OK, but it hardly makes a difference which I use. |
Can't wait! FYPI, I've updated the posting that defines the family of SELECT (Table1|p1), (Table2|p2) |
@pkoppstein What I think I'll do is remove the SQL-ish functions in One of the issues we have right now is that linking jq programs is quadratic. Every additional builtin significantly slows down jq start up. I've one thing I could commit soon that will help this, but the real thing we must do is make linking linear, and I only have a half-baked attempt at that. |
I agree that JOIN can wait; I would like to see a decent IN (assuming there's no time to fix index/1); and certainly if One thought on Erik Brinkman's contributions: If you delete all the rand functions in builtin.jq but retain the core functionality, we can have our (random) cake and eat it (without quadratic concerns)! |
@pkoppstein Everything can wait. What I'd like to do is:
|
I just don't have time to fix linking now, and we really can't be making startup time much worse. |
@nicowilliams - The thing is that INDEX and especially IN have been quite widely "advertised" and used, so if you drop them both without a replacement, it will hurt. As already mentioned, IN is especially valuable in light of the performance issue with index/1. Since "master" has been able to bear the weight of the SQLish additions without (to my knowledge) anyone raising performance as an issue, it seems to me that it would be reasonable to:
That way, we'd be down a net of 2 builtins from where we are now in "master". But, yes, it's time for 1.6 ... so no need to respond to this posting. |
@pkoppstein - I like the simple interface ! However, I have a need for LEFT JOIN. Could that be supported by your code easily? Here is what I have come up so far, based on https://stackoverflow.com/questions/39830426/join-two-json-files-based-on-common-key-with-jq-utility-or-alternative-way-from:
|
[EDIT: left_joins/6 was corrected on Jan 21, 2018.] @caramdache - Here are lightly-tested “LEFT JOIN” filters. I've also revised
|
Thanks @pkoppstein, I'll test this first thing on Monday! I have some questions in the meantime:
|
Regarding Regarding Regarding p1 and p2 - in your case, you will probably want to use |
@pkoppstein, I was thinking In |
@caramdache - please note that left_joins/6 has been corrected. The arity-6 filters are intended to be “low-level building blocks”(*) whereas all the others are intended to be more familiar - that’s why only the arity-4 filters have (*) Notably, the user might wish to define a transformation of the outputs, [$a, $b], that cannot be specified as a function of ($a + $b). |
@pkoppstein, this seems to work correctly, as far as I can see on a dataset of 1861 items. In actual usage, the interface feels a little awkward. I have 2 consecutive joins to perform. With my previous
Here, it seems that I have to BTW, I am not quite sure what changes you've made between your 2 edits. Based on what I can remember, the code looks the same. |
Thinking more about it, an even more natural syntax seems like:
|
@caramdache --
|
Maybe I've misunderstood, but it seems that the JOIN example above on the original data / original expected output is not correct, or was correct at the time but is not now correct with jq-1.6. That example is (approximately)
which with jq-1.6 produces:
but the original post expected:
I think the issue is that the INDEX expression 'loses' one of the two rows that have "classid":1 and "isntanceid":1, as discussed here. I think if the index is built on descriptions rather than on assets we can produce the original expected results:
Am I missing something? Thanks, Andy |
Hello, I have a json with 2 arrays with common id's that I'd like to join but I couldn't find a way to do it:
Having the above json as input, I want to get a "name" from "descriptions" for each asset in "assets" by joining classid and instanceid from both arrays.
So the desired output would be:
The equivalent in SQL would be something like:
select assets.assetsid, descriptions.name from descriptions, assets where descriptions.classid=assets.classid and descriptions.instanceid=assets.instanceid;
The text was updated successfully, but these errors were encountered: