Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix 412 replace numeric value in combustion #413

Merged
merged 4 commits into from
Jan 28, 2023

Conversation

nesnoj
Copy link
Collaborator

@nesnoj nesnoj commented Jan 20, 2023

Fixes #412

Here's the start. But adding the column name as in #392 does not work as some values seem to be of list type:

Table 'combustion_extended' is filled with data 'einheitenverbrennung' from the bulk download.
File 'EinheitenVerbrennung.xml' is parsed.
Data is cleansed.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/mastr.py", line 227, in download
    write_mastr_xml_to_database(
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/xml_download/utils_write_to_database.py", line 58, in write_mastr_xml_to_database
    df = cleanse_bulk_data(df, zipped_xml_file_path)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/xml_download/utils_cleansing_bulk.py", line 14, in cleanse_bulk_data
    df = replace_mastr_katalogeintraege(
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/xml_download/utils_cleansing_bulk.py", line 41, in replace_mastr_katalogeintraege
    df[column_name].astype("float").astype("Int64").map(katalogwerte)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 448, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
    return arr.astype(dtype, copy=True)
ValueError: could not convert string to float: '2442, 2442'

Example numeric data from CSV:

units.WeitereBrennstoffe.unique()
array([nan, '2473', '2469', '2445', '2467', '2466', '2442, 2442', '2421',
       '2479', '2448', '2436', '2445, 2469, 2473', '2424', '2437', 2473.0,
       2416.0, 2469.0, 2442.0, 2467.0, 2445.0, 2472.0, 2448.0, 2477.0,
       2436.0, 2419.0, 2457.0, '2467, 2414', '2414', '2442', '2447',
       ...
       '2469, 2473, 2474, 2477', 3035.0, 2468.0], dtype=object)

Is there a function to cope with list values in the strings? I guess other columns show similar patterns? @FlorianK13

@deniztepe deniztepe marked this pull request as ready for review January 27, 2023 14:23
@deniztepe deniztepe merged commit 0a3d86e into develop Jan 28, 2023
@deniztepe deniztepe deleted the bugfix-412-replace-numeric-value-in-combustion branch January 28, 2023 10:02
@nesnoj
Copy link
Collaborator Author

nesnoj commented Jan 28, 2023

Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace numeric value in column combustion.WeitereBrennstoffe with name
3 participants