Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce shaped arg to execute_mdx_dataframe #893

Merged
merged 2 commits into from
Apr 12, 2023

Conversation

MariusWirtz
Copy link
Collaborator

@MariusWirtz MariusWirtz commented Apr 8, 2023

Inspired by
Kevin-Dekker@8ab9901#diff-5242c6c19161de7f529a4e5b0d0b33ca9c15f69be65fa5409881b3d8310ff655R2799-R2802

execute_mdx_dataframe, execute_view_datafame can be run with shaped=True to retrieve a similar but not fully equivalent result to execute_mdx_dataframe_shaped.

This allows retrieving a shaped data frame while specifying options that are not available in execute_mdx_dataframe_shaped. One example is the usage of use_iterative_json=True and use_blob=True which optimizes memory usage during string to json / dict conversion prior to the creation of the data frame.

Fixes #888

@Cubewise-gtejeda
Copy link

The memory footprint was much lower when trying this, confirming previous tests.
Not sure if this is just a formatting issue, but when printing out the dataframe the formats of the numbers look different:

use_iterative_json=False:
image

use_iterative_json=True:
image

The shape and order seem to be identical, which is also expected.
However, there is an additional string appearing when using use_iterative_json=True:
image

For existing queries, I am worried that this might break backward compatibility. Could there be an option to suppress that string?

@MariusWirtz
Copy link
Collaborator Author

Thanks for the review! Good catch on the index column. I will make sure we get rid of this incorrect column label.

Regarding the formatting, this is kinda expected. The data frame is built differently and now the type is inherited. In the old execute_view_dataframe_shaped function the type would always be a string. The new approach is more intelligent IMO.

The scientific notation is just a pandas thing.
You should be able to switch to normal numbers like this

import pandas as pd
pd.options.display.float_format = '{:.5f}'.format

@Cubewise-gtejeda
Copy link

Thanks for the update.
The datatype inheritance - although a good thing - might break compatibility with existing reports in Power BI, since users currently would change the datatype in the query. I am not sure if this would cause an error in the query, but cannot test this in the connector until the release.

Could this be a parameter as well?

Thoughts?

@MariusWirtz MariusWirtz force-pushed the feature/optimize-memory-in-dataframe-shaped branch from e3e4fb4 to dac6867 Compare April 12, 2023 10:43
@MariusWirtz
Copy link
Collaborator Author

MariusWirtz commented Apr 12, 2023

The datatype inheritance - although a good thing - might break compatibility with existing reports in Power BI, since users currently would change the datatype in the query. I am not sure if this would cause an error in the query, but cannot test this in the connector until the release.

It wouldn't break any existing applications, because it only applies when use_blob=True or use_iterative_json=True is passed.
I think we are on track to release to PyPI this week.

I can't reproduce the strange column header. Please post the code that produces the data frame.

@MariusWirtz MariusWirtz force-pushed the feature/optimize-memory-in-dataframe-shaped branch from dac6867 to e800c21 Compare April 12, 2023 13:15
@MariusWirtz
Copy link
Collaborator Author

The unwanted column header has been removed.

@MariusWirtz MariusWirtz merged commit 59947a0 into master Apr 12, 2023
@MariusWirtz MariusWirtz deleted the feature/optimize-memory-in-dataframe-shaped branch October 15, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants