-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with Dataframe as function argument or mask assignement #28591
Comments
Ok I found a way to go through this by exploring the first idea. Switching : |
Is this a bug report? We recommend stackoverflow for usage questions. |
It's both =)
Perhaps I'm not using it in the right way, but seems tricky for me that a
list assignation on a masked Dataframe doesn't work, but a Series is fine.
|
I've been further in given solution, and dataframe.loc[mask, out] = pd.Series(features.tolist()) seems to assign only first row and doesn't care about the mask... |
If reporting a bug, we would need a minimal, reproducible example of the buggy behavior. Feel free to reopen when you can post an example. |
Problem description
Hi everyone,
I'm here because I want to achieve some stuff, but I don't know how to achive it in the best way. I browse a lot of topics on stackoverflow without a clue. Here is the thing, I want to achieve some utilities function for machine learning stuff.
I want to make several function that add new data to an existing dataframe, and manage some filtering of it, without making any data assignement.
Idea #1
# Method call
transform(inputs[mask], {'datum': 'Data'}, 'PCA', PCA())
# Method definition
def transform(dataframe, tags, out, model):
# Check mandatory fields
mandatory = ['datum']
if not isinstance(tags, dict) or not all(elem in mandatory for elem in tags.keys()):
raise Exception(f'Not a dict or missing tag: {mandatory}.')
Idea #2
# Method call
transform(inputs, {'datum': 'Data'}, 'PCA', PCA(), mask)
# Method definition
def transform(dataframe, tags, out, model, mask=None):
# Check mandatory fields
mandatory = ['datum']
if not isinstance(tags, dict) or not all(elem in mandatory for elem in tags.keys()):
raise Exception(f'Not a dict or missing tag: {mandatory}.')
Input
Row ; Data ; Label
0 ; [61953.017837947686, 9.505037089204054, 74.585... ] ;0
1 ; [80832.69302693632, 9.524642547991316, 83.9228... ] ;1
Expected Output
Row ; Data ; Label ; PCA
0 ; [61953.017837947686, 9.505037089204054, 74.585... ] ;0 ; [74.585... ]
1 ; [80832.69302693632, 9.524642547991316, 83.9228... ] ;1 ; [92.578... ]
I'm doing 'features = model.transform(dataframe.loc[mask, tags['datum']].to_numpy())' to manage my data as a matrix and not a line by line operation by use of apply method().
First idea doesn't seems to work, as it's no possible to pass as an argument a view and change data in it from a function, as the datframe would be convert in a get_item when I pass it to the function, it will became a copy of the dataframe.
Second idea doesn't seems to work, as 'dataframe.loc[mask, out] = features.tolist()' is returning ValueError: 'Must have equal len keys and value when setting with an ndarray', as it seem to deal with the mask element by element...
EDIT:
I'm doing it in two steps, not the most intuitive way for now...
https://stackoverflow.com/questions/58064179/pandas-masked-dataframe-assign-2d-array
I don't have any ideas at this point...
If you have any advices, I would be thanful,
Best regards
The text was updated successfully, but these errors were encountered: