-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added static typing to data_utils.py #662
Conversation
@Sanketh7 looks like the branch needs updating as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments mainly around formatting and +1 to @JGSweets comment around not casting
f08ecf5
to
bc44f0d
Compare
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment around the dict isinstance
@@ -90,7 +105,7 @@ def unicode_to_str(data, ignore_dicts=False): | |||
# if data is a dictionary | |||
if isinstance(data, dict) and not ignore_dicts: | |||
return { | |||
unicode_to_str(key, ignore_dicts=True): unicode_to_str( | |||
cast(str, unicode_to_str(key, ignore_dicts=True)): unicode_to_str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no guarantee this is a string, this could be an int, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the cast (not needed anymore after I changed JSONType).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome -- thx, @Sanketh7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it removed @Sanketh7 ? looks like it is still there ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed by 8de0cef
@@ -99,7 +114,11 @@ def unicode_to_str(data, ignore_dicts=False): | |||
return data | |||
|
|||
|
|||
def json_to_dataframe(json_lines, selected_columns=None, read_in_string=False): | |||
def json_to_dataframe( | |||
json_lines: List[JSONType], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should actually validate this, @taylorfturner can df = pd.DataFrame(json_lines)
take in any list of JSON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works assuming List[JSONType]
but should it be Dict[str, List[JSONType]]
?
from typing import Dict, List, Union
import pandas as pd
JSONType = Union[str, int, float, bool, None, List, Dict]
test_iter_list = []
test_iter_list.append(['test', 'test'])
test_iter_list.append([1,2,3])
test_iter_list.append([1.02, 2.02, 3.03])
test_iter_list.append([True, False])
test_iter_list.append([None, None])
test_iter_list.append([['test', 'test'], ['test', 'test']])
test_iter_list.append([{'test': 'test', 'test': 'test'}])
for test_iter in test_iter_list:
pd.DataFrame(test_iter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sanketh7, I think the docstring for this on L126 :type json_lines: list(dict)
needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does work. So I think we are GTG
for test_iter in test_iter_list: json_to_dataframe(test_iter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List[list] works
In [307]: data = data_generator("""[["test", "test"], ["test", "test"]]""".splitlines())
In [308]: read_json(data)
Out[308]: [[['test', 'test'], ['test', 'test']]]
Head branch was pushed to by a user without write access
#610