-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ignore_index to pandas_to_eland #154
Adding ignore_index to pandas_to_eland #154
Conversation
Parameter is useful when adding multiple pd.DataFrame's to the same index. Also, updated test module to pandas.testing for 1.0.x compliance,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good to have, some comments for you
eland/utils.py
Outdated
if es_dropna: | ||
values = row[1].dropna().to_dict() | ||
else: | ||
values = row[1].to_dict() | ||
|
||
# Use integer as id field for repeatable results | ||
action = {"_index": es_dest_index, "_source": values, "_id": str(id)} | ||
if ignore_index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should change the name of this parameter to be more verbose but also clear on what "index" is being ignored since the concept appears in both pandas and ES.
Maybe ignore_pandas_index
or even flip the default and use use_pandas_index_as_id=True
?
eland/utils.py
Outdated
number of pandas.DataFrame rows to read before bulk index into Elasticsearch | ||
Number of pandas.DataFrame rows to read before bulk index into Elasticsearch | ||
ignore_index: bool, default 'False' | ||
Ignore pandas.DataFrame.index when indexing into Elasticsearch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the ?, document the behavior for the true and false case
@@ -65,6 +65,40 @@ def test_generate_es_mappings(self): | |||
|
|||
assert_pandas_eland_frame_equal(df, ed_df_head) | |||
|
|||
ES_TEST_CLIENT.indices.delete(index=index_name) | |||
|
|||
def test_pandas_to_eland_ignore_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit test passes without the ignore_index=True
, should add a that fails if not using the functionality.
Maybe adding this assert is enough:
# Ensure that index is populated by ES.
assert not (df.index == pd_df.index).any()
pd_df = ed.eland_to_pandas(ed_df) | ||
|
||
# Compare values excluding index | ||
assert df.values.all() == pd_df.values.all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be (df.values == pd_df.values).all()
?
Many thanks! Great comments - resolved in new commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge after CI passes
Parameter is useful when adding multiple pd.DataFrame's to
the same index.
Also, updated test module to pandas.testing for 1.0.x
compliance,