Version 0.7.0 breaks pyam test #50

phackstock · 2024-02-28T10:44:19Z

As I was working on pyam yesterday (IAMconsortium/pyam#818) I noticed that ixmp4 0.7.0 broke the test test_ixmp4_integration[test_df0] from tests/test_ixmp4.py (https://github.com/IAMconsortium/pyam/actions/runs/8067457546/job/22037916444) with the following error:

...
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowNotImplementedError: Function 'not_equal' has no kernel matching input types (string, double)

pyarrow/error.pxi:91: ArrowNotImplementedError

looks like it's got something to do with pyarrow.
Reverting to ixmp4 version 0.6.0 fixed the test.
Just talked to @meksor and he said that you, @glatterf42, would be the best person to take a look.

FYI @danielhuppmann.

The text was updated successfully, but these errors were encountered:

glatterf42 · 2024-02-28T11:20:50Z

I'm not sure this is ixmp4's fault. My interpretation is that

/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/ixmp4/data/db/meta/repository.py:194: in bulk_upsert
    super().bulk_upsert(type_df)
/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/ixmp4/data/db/base.py:376: in bulk_upsert
    self.bulk_upsert_chunk(df)
/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/ixmp4/data/db/base.py:394: in bulk_upsert_chunk
    cond.append(df[col] != df[updated_col])

and

 E   pyarrow.lib.ArrowNotImplementedError: Function 'not_equal' has no kernel matching input types (string, double)

pyarrow/error.pxi:91: ArrowNotImplementedError

indicate that df[col] and df[updated_col] don't have compatible types. This could be on ixmp4 for trying to update the wrong column (which seems odd) or it could be on the pyam test setup for trying to pass an incompatible type.
I'll have to install pyam and run the tests myself to inspect test_df more closely, I think.

glatterf42 · 2024-02-28T13:54:56Z

I've tracked the error down for the most part, but I'm not sure how to resolve it yet. As far as I can tell,t he following is happening:
In

ixmp4/ixmp4/core/run.py

Line 122 in 922f958

self.backend.meta.bulk_upsert(df)

, we have this dataframe:

   key value  run__id
0  number     1        1
1  string   foo        1

This survives until

ixmp4/ixmp4/data/db/meta/repository.py

Line 186 in 922f958

null_cols = set(RunMetaEntry._column_map.values()) - set([col])

:

       key value  run__id      type
0  number     1        1  Type.INT
1  string   foo        1  Type.STR

But then, we call bulk_upsert() individually for each type. For the first type, this is not an issue, but for the second type, we already have an existing_df looking like this:

    run__id     key type  value_int value_str value_float value_bool  id
0        1  number  INT          1      None        None       None   1

here:

ixmp4/ixmp4/data/db/base.py

Line 388 in 922f958

df = self.merge_existing(df, existing_df)

And for some reason, the comparison then fails. We want to insert a value of type string (according to pandas and pyarrow), but the existing_df already contains a value of type float64 in that column, presumably None is converted to that type.

@meksor, @danielhuppmann, if you have experience with this or immediately know what to do, please jump in here. Otherwise, I'll find a fix tomorrow.

phackstock · 2024-02-28T14:09:22Z

Thanks a lot for the detective work @glatterf42.

meksor · 2024-02-28T15:43:44Z

What /exactly/ is pyam trying to pass to ixmp4 as meta values?

danielhuppmann · 2024-02-28T15:46:57Z

A dataframe like this converted to a dict like this.

meksor · 2024-02-28T16:03:48Z

OK so that becomes a dict like: {"model": "model_a", "scenario": "scen_a", "number": 1, "string": "foo"}
Inserting that into the ixmp4 tests everything passes... ???

danielhuppmann · 2024-02-28T16:08:41Z

And the pyam-ixmp4-test passed last week - so it must be either the pandas-update yesterday or some ixmp4 change since v0.6...

meksor · 2024-02-28T16:18:59Z

Im running it locally with pandas 2.2.1 and the newest ixmp4 version...

meksor · 2024-02-28T16:36:03Z

Ok update, if I install pyarrow /alongside/ ixmp4, the tests in ixmp4 also fail... Seems pandas uses pyarrow if its available, breaking this test....

meksor · 2024-02-28T18:02:20Z

OK so pandas version 1.5.3 still works. Seems pandas version >2 will change its behaviour if pyarrow is installed. converting the columns to each other's types just yields another pyarrow error.
I would suggest reverting the update to pandas 2, seems a bunch of stuff broke...

danielhuppmann · 2024-02-29T06:57:08Z

We bumped pyam to depend on pandas >= 2.0 a while ago to take advantage of the fast speed and improved API, so pinning ixmp4<2 isn't really an option...

glatterf42 · 2024-02-29T08:02:06Z

I've added some auxiliary output to bulk_upsert_chunk() like this:

    def bulk_upsert_chunk(self, df: pd.DataFrame) -> None:
        columns = db.utils.get_columns(self.model_class)
        df = df[list(set(columns.keys()) & set(df.columns))]
        existing_df = self.tabulate_existing(df)
        print(f"df's dtypes: \n {df.dtypes}")
        if existing_df.empty:
            self.bulk_insert(df)
        else:
            df = self.merge_existing(df, existing_df)
            df["exists"] = np.where(pd.notnull(df["id"]), True, False)
            print(f"existing df \n {existing_df}")
            print(f"existing df's dtypes: \n {existing_df.dtypes}")
            print(f"new df's dtypes: \n {df.dtypes}")

And the corresponding out shows this:

df's dtypes: 
 value_bool     object
value_float    object
value_str      object
key            object
run__id         int64
value_int      object
type           object
dtype: object
existing df 
    run__id     key type  value_int value_str value_float value_bool  id
0        1  number  INT          1      None        None       None   1
existing df's dtypes: 
 run__id         int64
key            object
type           object
value_int       int64
value_str      object
value_float    object
value_bool     object
id              int64
dtype: object
new df's dtypes: 
 run__id                    int64
value_bool       string[pyarrow]
value_float      string[pyarrow]
value_str        string[pyarrow]
key              string[pyarrow]
value_int        string[pyarrow]
type             string[pyarrow]
type_y           string[pyarrow]
value_int_y              float64
value_str_y      string[pyarrow]
value_float_y    string[pyarrow]
value_bool_y     string[pyarrow]
id                       float64
exists                      bool
dtype: object

So it looks like one of these

            df = self.merge_existing(df, existing_df)
            df["exists"] = np.where(pd.notnull(df["id"]), True, False)

is responsible for the data conversion to unexpected types.

glatterf42 · 2024-02-29T09:13:16Z

It's happening in df = self.merge_existing(df, existing_df) already, which makes me think this could be related to dask instead of pandas.

dask/dask#10631 might be related.

meksor · 2024-02-29T14:14:46Z

fixed with https://github.com/iiasa/ixmp4/releases/tag/v0.7.1

phackstock assigned glatterf42 Feb 28, 2024

glatterf42 mentioned this issue Feb 29, 2024

Fix ixmp4 tests and support Python 3.12 IAMconsortium/pyam#820

Closed

4 tasks

glatterf42 changed the title ~~ixmp 0.7.0 breaks pyam test~~ Version 0.7.0 breaks pyam test Feb 29, 2024

This was referenced Feb 29, 2024

Add extra check if both values are NA in bulk_upsert #52

Closed

Add extra check if both values are NA in bulk_upsert #53

Merged

meksor closed this as completed in #53 Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 0.7.0 breaks pyam test #50

Version 0.7.0 breaks pyam test #50

phackstock commented Feb 28, 2024

glatterf42 commented Feb 28, 2024

glatterf42 commented Feb 28, 2024

phackstock commented Feb 28, 2024

meksor commented Feb 28, 2024

danielhuppmann commented Feb 28, 2024

meksor commented Feb 28, 2024 •

edited

Loading

danielhuppmann commented Feb 28, 2024

meksor commented Feb 28, 2024

meksor commented Feb 28, 2024

meksor commented Feb 28, 2024

danielhuppmann commented Feb 29, 2024

glatterf42 commented Feb 29, 2024

glatterf42 commented Feb 29, 2024

meksor commented Feb 29, 2024

Version 0.7.0 breaks pyam test #50

Version 0.7.0 breaks pyam test #50

Comments

phackstock commented Feb 28, 2024

glatterf42 commented Feb 28, 2024

glatterf42 commented Feb 28, 2024

phackstock commented Feb 28, 2024

meksor commented Feb 28, 2024

danielhuppmann commented Feb 28, 2024

meksor commented Feb 28, 2024 • edited Loading

danielhuppmann commented Feb 28, 2024

meksor commented Feb 28, 2024

meksor commented Feb 28, 2024

meksor commented Feb 28, 2024

danielhuppmann commented Feb 29, 2024

glatterf42 commented Feb 29, 2024

glatterf42 commented Feb 29, 2024

meksor commented Feb 29, 2024

meksor commented Feb 28, 2024 •

edited

Loading