Adding ignore_index to pandas_to_eland #154

stevedodson · 2020-03-31T11:36:09Z

Parameter is useful when adding multiple pd.DataFrame's to
the same index.

Also, updated test module to pandas.testing for 1.0.x
compliance,

Parameter is useful when adding multiple pd.DataFrame's to the same index. Also, updated test module to pandas.testing for 1.0.x compliance,

sethmlarson

This would be good to have, some comments for you

sethmlarson · 2020-03-31T13:17:05Z

eland/utils.py

        if es_dropna:
            values = row[1].dropna().to_dict()
        else:
            values = row[1].to_dict()

-        # Use integer as id field for repeatable results
-        action = {"_index": es_dest_index, "_source": values, "_id": str(id)}
+        if ignore_index:


Maybe we should change the name of this parameter to be more verbose but also clear on what "index" is being ignored since the concept appears in both pandas and ES.

Maybe ignore_pandas_index or even flip the default and use use_pandas_index_as_id=True?

sethmlarson · 2020-03-31T13:17:42Z

eland/utils.py

-        number of pandas.DataFrame rows to read before bulk index into Elasticsearch
+        Number of pandas.DataFrame rows to read before bulk index into Elasticsearch
+    ignore_index: bool, default 'False'
+        Ignore pandas.DataFrame.index when indexing into Elasticsearch?


Drop the ?, document the behavior for the true and false case

sethmlarson · 2020-03-31T13:26:41Z

eland/tests/dataframe/test_utils_pytest.py

@@ -65,6 +65,40 @@ def test_generate_es_mappings(self):

        assert_pandas_eland_frame_equal(df, ed_df_head)

+        ES_TEST_CLIENT.indices.delete(index=index_name)
+
+    def test_pandas_to_eland_ignore_index(self):


This unit test passes without the ignore_index=True, should add a that fails if not using the functionality.

Maybe adding this assert is enough:

# Ensure that index is populated by ES. assert not (df.index == pd_df.index).any()

sethmlarson · 2020-03-31T13:28:21Z

eland/tests/dataframe/test_utils_pytest.py

+        pd_df = ed.eland_to_pandas(ed_df)
+
+        # Compare values excluding index
+        assert df.values.all() == pd_df.values.all()


Shouldn't this be (df.values == pd_df.values).all()?

stevedodson · 2020-03-31T14:08:01Z

Many thanks! Great comments - resolved in new commit.

sethmlarson

LGTM, will merge after CI passes

Adding ignore_index to pandas_to_eland

3ac8a7a

Parameter is useful when adding multiple pd.DataFrame's to the same index. Also, updated test module to pandas.testing for 1.0.x compliance,

stevedodson requested a review from sethmlarson March 31, 2020 11:36

sethmlarson suggested changes Mar 31, 2020

View reviewed changes

sethmlarson reviewed Mar 31, 2020

View reviewed changes

Resolving review comments

93959b8

sethmlarson approved these changes Mar 31, 2020

View reviewed changes

sethmlarson merged commit 71f2a3f into elastic:master Mar 31, 2020

stevedodson deleted the pandas_to_eland_ignore_index branch April 1, 2020 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ignore_index to pandas_to_eland #154

Adding ignore_index to pandas_to_eland #154

stevedodson commented Mar 31, 2020

sethmlarson left a comment

sethmlarson Mar 31, 2020

sethmlarson Mar 31, 2020

sethmlarson Mar 31, 2020 •

edited

Loading

sethmlarson Mar 31, 2020 •

edited

Loading

stevedodson commented Mar 31, 2020

sethmlarson left a comment

Adding ignore_index to pandas_to_eland #154

Adding ignore_index to pandas_to_eland #154

Conversation

stevedodson commented Mar 31, 2020

sethmlarson left a comment

Choose a reason for hiding this comment

sethmlarson Mar 31, 2020

Choose a reason for hiding this comment

sethmlarson Mar 31, 2020

Choose a reason for hiding this comment

sethmlarson Mar 31, 2020 • edited Loading

Choose a reason for hiding this comment

sethmlarson Mar 31, 2020 • edited Loading

Choose a reason for hiding this comment

stevedodson commented Mar 31, 2020

sethmlarson left a comment

Choose a reason for hiding this comment

sethmlarson Mar 31, 2020 •

edited

Loading

sethmlarson Mar 31, 2020 •

edited

Loading