Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_features(..., extend_features=...) InvalidIndexError: Reindexing only valid with uniquely valued Index objects #10

Open
sgbaird opened this issue Feb 3, 2022 · 2 comments

Comments

@sgbaird
Copy link

sgbaird commented Feb 3, 2022

Processing Input Data: 100%|██████████| 1794/1794 [00:00<00:00, 7378.49it/s]
	Featurizing Compositions...
Assigning Features...: 100%|██████████| 1778/1778 [00:00<00:00, 3426.03it/s]
NOTE: Your data contains formula with exotic elements. These were skipped.
	Creating Pandas Objects...

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
[<ipython-input-45-22826a03d387>](https://localhost:8080/#) in <module>()
      1 from CBFV import composition
----> 2 X, y, formulae, skipped = composition.generate_features(df, extend_features="R")

4 frames
[/usr/local/lib/python3.7/dist-packages/CBFV/composition.py](https://localhost:8080/#) in generate_features(df, elem_prop, drop_duplicates, extend_features, sum_feat, mini)
    307         extended = pd.DataFrame(extra_features, columns=features)
    308         extended = extended.set_index('formula', drop=True)
--> 309         X = pd.concat([X, extended], axis=1)
    310 
    311     # reset dataframe indices

[/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

[/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/concat.py](https://localhost:8080/#) in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    305     )
    306 
--> 307     return op.get_result()
    308 
    309 

[/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/concat.py](https://localhost:8080/#) in get_result(self)
    526                     obj_labels = obj.axes[1 - ax]
    527                     if not new_labels.equals(obj_labels):
--> 528                         indexers[ax] = obj_labels.get_indexer(new_labels)
    529 
    530                 mgrs_indexers.append((obj._mgr, indexers))

[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_indexer(self, target, method, limit, tolerance)
   3440 
   3441         if not self._index_as_unique:
-> 3442             raise InvalidIndexError(self._requires_unique_msg)
   3443 
   3444         if not self._should_compare(target) and not is_interval_dtype(self.dtype):

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
@sgbaird
Copy link
Author

sgbaird commented Feb 3, 2022

Seems to be an issue with repeat chemical formulas in the DataFrame

@sgbaird
Copy link
Author

sgbaird commented Feb 9, 2022

Workaround is to use a for loop for the other properties of interest, renaming the column of interest each time.

For example:

from CBFV.composition import generate_features
ys = []
for name in ["property1", "property2", "property3"]:
  X, y, formulae, skipped = generate_features(df.rename(columns={name: "target"}))
  ys.append(y)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant