-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3.x support + PEP8 Cleanup #10
Conversation
EDF.py
Outdated
# meas_info should have ['record_length', 'magic', 'hour', 'subject_id', 'recording_id', 'n_records', 'month', 'subtype', 'second', 'nchan', 'data_size', 'data_offset', 'lowpass', 'year', 'highpass', 'day', 'minute'] | ||
# chan_info should have ['physical_min', 'transducers', 'physical_max', 'digital_max', 'ch_names', 'n_samps', 'units', 'digital_min'] | ||
#################################################################################################### | ||
return (str(buf) + ' ' * num) if (num >= 0) else buf[0:num] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have tests to show this, but this looks exactly like the tye of thing that will break in Python 3. In the padding case, a string is returned. In the trim case, it's unclear whether a string or bytes
are returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax is globally accepted and in my system, it has not break in both py2.7 and py3.x yet. You may test it individually and let me know if it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the downsides to runtime typing. The syntax is valid, but you might still get an error depending on the types passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the code and errors are unlikely. See,
padtrim()
is a helper function used to standardize header content to pass into writebyte()
. In Python 2.x, strings and bytes are synonymous unlike in python 3.x. If you check the logic of writebyte()
, it has error-handling support which will deal with both bytes and string formats i.e. output will be a bytes value. If some other format is returned (such as float or boolean), which is very unlikely logically, then we have to worry. I am not sure how I should write a unit test for least likely scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an advantageous to having consistent types internally. For example, this could be fixed by having the trim case also convert to str
. Alternatively, if doing the preferred way of dealing with str
in Python 3, we assume that everything internally is str
and only the actual I/O operations are dealing with bytes
and those are doing the conversion appropriately (then we don't need the conversion to str
in the pad case). writebyte
handles the output bit of that, and I think (but haven't checked) that all of the file.read
are explicitly converted to str
(or int
).
EDF.py
Outdated
self.init_properties() | ||
if fname: | ||
self.open(fname) | ||
|
||
def init_properties(self): | ||
self.fname = None | ||
self.meas_info = None | ||
self.chan_info = None | ||
self.calibrate = None | ||
self.offset = None | ||
self.n_records = 0 | ||
if fname: | ||
self.open(fname) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this change was necessary. If it were possible to initialize the properties elsewhere, this would make sense, but that isn't the case. Even then, init_properties
should be private: _init_properties
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function init_properties() contains a few value reset lines. These lines are actually called again in present line 118 when we are closing the EDF file. This function wrap only reduces the redundant lines and now the original call is covered in init()
I am explaining this after 10 months of original edit. If it sounds mistaken, let me know.
#################################################################################################### | ||
# the following are a number of helper functions to make the behaviour of this EDFReader | ||
# class more similar to https://bitbucket.org/cleemesser/python-edf/ | ||
#################################################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't delete this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is covered in details in the added documentation -
Helper Methods: The following are a number of helper functions to make
the behaviour of this EDFReader class more similar to
https://bitbucket.org/cleemesser/python-edf/
get_signal_text_labels: Convert Signal Text Labels from unicode
to strings.
get_n_signals: Get Number of Channels.
get_signal_freqs: Get Signal Frequencies.
get_n_samples: Get Total Number of Samples.
read_signal: Reads Entire Signal Record and returns the
values as an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is value to having the documentation close to the relevant code.
EDF.py
Outdated
implementation details, check out https://www.edfplus.info/index.html | ||
""" | ||
__version__ = "0.4.0" | ||
__author__ = "https://github.com/robertoostenveld | https://github.com/bsandeepan95>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think traditionally this is name + email address. Note also the distinction between author, credits/contributora, and maintainera.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I don't know robert's mail id so I put it like this. Even in original code, he has not added his email id or anything as far as I have checked. I wished to do it properly using his name as author and ours as contributors but I don't have the necessary resources. Hence I had to wait till this code review.
@robertoostenveld please send me your email id so I can add it properly.
@palday you too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the output of git log
locally, you'll see committer email addresses. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palday your email id is not there.
One thing I forgot: |
@palday I am new to this pull requests so I there are some changes that I wish to do right away and commit but it seems there are many such ones and github is providing me a "Commit Change" button. If I use that button, will it make numerous new comments for each change I save, or will it stage them together later when the pull is successful? Seeking your guidance. |
I think "Commit change" creates a new commit for every single change, but lots of small commits are fine -- especially because they'll all be squashed during the merge. Thanks for doing this: I know it's a lot of effort, and we do appreciate this. My review is critical, but it's meant to be constructive. |
EDF.py
Outdated
return (str(buf) + ' ' * num) if (num >= 0) else buf[0:num] | ||
|
||
|
||
def writebyte(file, content, encoding='utf8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def writebyte(file, content, encoding='utf8'): | |
def writebyte(file, content, encoding='ascii'): |
writebyte
should default to using ASCII encoding because that's what the EDF spec actually requires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you asking me to do this a year ago. I forgot. Apologies. 😅
EDF.py
Outdated
try: | ||
# Py3 onwards bytes and strings are separate data format | ||
content = bytes(content, encoding) | ||
except TypeError: | ||
# Py2.7 Support | ||
pass | ||
except Exception as e: | ||
print(type(e)) | ||
print(str(e)) | ||
print( | ||
"If you see this message, please go to " + | ||
"https://github.com/bids-standard/pyedf/issues" + | ||
" and open an issue there regarding this. Thank you!") | ||
finally: | ||
file.write(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
# Py3 onwards bytes and strings are separate data format | |
content = bytes(content, encoding) | |
except TypeError: | |
# Py2.7 Support | |
pass | |
except Exception as e: | |
print(type(e)) | |
print(str(e)) | |
print( | |
"If you see this message, please go to " + | |
"https://github.com/bids-standard/pyedf/issues" + | |
" and open an issue there regarding this. Thank you!") | |
finally: | |
file.write(content) | |
if type(content) != bytes: | |
content = content.encode(encoding) | |
file.write(content) |
(This works in Python 2.7 and Python 3 -- I haven't tested it in older Python 2, but am not interesting in supporting that!)
Also, note that the finally
clause here doesn't make much sense: if there has been a conversion error, then you don't want to try to write the bytes out to file. It would be better to simply leave out the extra except Exception as e
and finally
clauses and have file.write
be after the try
block. But in this case, there's an even easier solution (see suggestion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I implemented Error Handling block here, I used finally
clause to help people understand the code flow. Since there is only one line left to do after the block, and I know developers who can indent such lines one level inside the except
clause thinking it as a mistake on our end. I wish to avoid clearing up such confusion in the future if anyone faces this.
I added the except Exception as e
clause to ensure people know what to do if they face such error. I can always do away with this. This is only there for least expected scenarios.
bytes
is not a supported type in Python 2.7 so the if
clause will not work there. However, as we have already added the support in except TypeError
clause, how about we put the content = content.encode(encoding)
line in there? This will ensure that encoding is done correctly for Python 2.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 2.7.17 (default, Jul 20 2020, 15:37:01)
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "a"
>>> type(x)
<type 'str'>
>>> type(x) == bytes
True
>>> type(x) == str
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. bytes
is a synonym for str
in Python 2.7.
I mean, if we are to do ascii
encoding either way, why not enforce the line in the TypeError
block, which will be triggered in Python 2.7 since default str
encoding scheme is utf-8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I ran a bit of code for testing the error handling block in Python 2.7 and I feel the block is necessary. Run the following example -
# -*- coding: utf-8 -*-
enc = 'ascii'
x = 'µ Õ ß'
print x
try:
y = x.encode(enc)
except Exception as e:
print("Error Message: " + str(e))
print("Error Type: " + str(type(e)))
y = 'fail'
print y
print (x == y)
The output will be like the following -
Python 2.7.16 (default, Jul 13 2019, 16:01:51)
[GCC 8.3.0] on linux2
µ Õ ß
Error Message: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Error Type: <type 'exceptions.UnicodeDecodeError'>
fail
False
So errors will occur if there are non-ascii characters, and the code will fail. Now the question is if it is better to let the code fail, or if we should handle the error gracefully and display it to the console for the developer. Let me know your opinion about this @palday .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palday waiting for your response.
Please do. I am grateful for your review. It is making me rethink some of my past coding choices and shaping my perspective for future work. I may sometimes come out difficult to communicate, but I am trying to improve my own bugs too here. 😄 Strict review is necessary. Else we get unexpected drivers issues that discourage me from using Ubuntu Linux in my AMD Vega Graphics Workstation. >__> |
@palday, check the new commits |
I need to write some tests to check this, but it's on my radar. |
@palday I know we agreed to add the latest documentations in a later PR, but will it help the tests if I add them in this PR? |
Having gone back through the history of this PR .... I'm somewhat surprised that it didn't take advantage of the work already done in #4. But we can't change the past. I think the best solution for now is to merge this into a There seem to be some merge conflicts (which I don't quite understand since |
Kindly review the code and let me know if the commits are okay. I will add the docstrings for functions and methods in a separate pull request as @palday asked me to do earlier.