-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.12: join with on= fails with TimeStamp and Int64 MultiIndex #5647
Comments
can you put up a small reproducible example if possible. |
I tried quickly but couldn't come up with one that exhibits the same behavior. Can I put the frames into an HDF5 file and somehow ship it to you guys? This would be the easiest way. |
sure....show complete code to get them out...and put on dropbox like service and post the link |
https://www.dropbox.com/s/c0hq6uqn8y5vkbk/issue5647.h5
|
I think you want something like this
inner join
outer join I think is what you want (e.g. many-many)
|
Well, as you can see, the index in df2 is unique (at least it's supposed to be; by the way, is there a function that checks for that, similar to is_lexsorted?). So, the idea is to propagate unique values by (FactorDate, FundSecNo) from df2 into each (FactorDate, SourceFundSecNo) in df1. In other words, many-to-one. This is a pretty standard idea. I can work around that, sure, but it looks ugly and wastes CPU and memory. The problem is that it worked perfectly in 0.11 while it doesn't work in 0.12. Do you observe the same behavior? If so, why? Was 0.11 behavior wrong? join() is just syntactic sugar for merge(), right? Any toy example I tried to come up with in 0.12 worked as expected. |
join default is 'left', I think you just need 'outer' here. Not sure whyy its different in 0.11; I don't think it changed. you can do df2.index.is_unique
|
At the risk of beating the dead horse here, but I do want 'left'! The fact that 'left' returns no joins at all while 'outer' returns all of them just doesn't make sense. 0.11 behavior did make sense. How can we make sense of test2 below given test3?
|
not sure...maybe andy can chime in @hayd ? |
So I think we can see this from the head of df1 and df2, the left doesn't look right (it has only NaNs):
Clearly these shouldn't all be NaNs... |
I seem to be having the same problem with datetime and string (object). Here's my MRWE:
The problem is, I'd like to retain the index on dfb, so the best option for me would have been the one which doesn't work. As a workaround I can reset the index on dfb and set the index again after the merge:
This seems to be related to the fact that it is a MultiIndex and that there is a datetime in the index. The join() method works just fine if the index is just a simple datetime and it also works with the MultiIndex if I replace the datetime column with int values.
|
I neglected to mention I'm using pandas 0.13.1 |
just use merge_asof here |
I encountered a very strange problem in 0.12 when a straightforward join fails while it works just fine in 0.11.
I can attach the frames, but I forgot how to do it :-)
The text was updated successfully, but these errors were encountered: