Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.x support + PEP8 Cleanup #10

Merged
merged 11 commits into from
Mar 16, 2021
Merged

Conversation

bsandeepan
Copy link

@bsandeepan bsandeepan commented Jul 12, 2020

Kindly review the code and let me know if the commits are okay. I will add the docstrings for functions and methods in a separate pull request as @palday asked me to do earlier.

EDF.py Outdated Show resolved Hide resolved
EDF.py Outdated
# meas_info should have ['record_length', 'magic', 'hour', 'subject_id', 'recording_id', 'n_records', 'month', 'subtype', 'second', 'nchan', 'data_size', 'data_offset', 'lowpass', 'year', 'highpass', 'day', 'minute']
# chan_info should have ['physical_min', 'transducers', 'physical_max', 'digital_max', 'ch_names', 'n_samps', 'units', 'digital_min']
####################################################################################################
return (str(buf) + ' ' * num) if (num >= 0) else buf[0:num]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have tests to show this, but this looks exactly like the tye of thing that will break in Python 3. In the padding case, a string is returned. In the trim case, it's unclear whether a string or bytes are returned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax is globally accepted and in my system, it has not break in both py2.7 and py3.x yet. You may test it individually and let me know if it does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the downsides to runtime typing. The syntax is valid, but you might still get an error depending on the types passed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the code and errors are unlikely. See,

padtrim() is a helper function used to standardize header content to pass into writebyte(). In Python 2.x, strings and bytes are synonymous unlike in python 3.x. If you check the logic of writebyte(), it has error-handling support which will deal with both bytes and string formats i.e. output will be a bytes value. If some other format is returned (such as float or boolean), which is very unlikely logically, then we have to worry. I am not sure how I should write a unit test for least likely scenario

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an advantageous to having consistent types internally. For example, this could be fixed by having the trim case also convert to str. Alternatively, if doing the preferred way of dealing with str in Python 3, we assume that everything internally is str and only the actual I/O operations are dealing with bytes and those are doing the conversion appropriately (then we don't need the conversion to str in the pad case). writebyte handles the output bit of that, and I think (but haven't checked) that all of the file.read are explicitly converted to str (or int).

EDF.py Outdated
Comment on lines 70 to 81
self.init_properties()
if fname:
self.open(fname)

def init_properties(self):
self.fname = None
self.meas_info = None
self.chan_info = None
self.calibrate = None
self.offset = None
self.n_records = 0
if fname:
self.open(fname)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this change was necessary. If it were possible to initialize the properties elsewhere, this would make sense, but that isn't the case. Even then, init_properties should be private: _init_properties.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function init_properties() contains a few value reset lines. These lines are actually called again in present line 118 when we are closing the EDF file. This function wrap only reduces the redundant lines and now the original call is covered in init()

I am explaining this after 10 months of original edit. If it sounds mistaken, let me know.

EDF.py Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
Comment on lines -332 to -335
####################################################################################################
# the following are a number of helper functions to make the behaviour of this EDFReader
# class more similar to https://bitbucket.org/cleemesser/python-edf/
####################################################################################################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't delete this

Copy link
Author

@bsandeepan bsandeepan Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is covered in details in the added documentation -
Helper Methods: The following are a number of helper functions to make
the behaviour of this EDFReader class more similar to
https://bitbucket.org/cleemesser/python-edf/
get_signal_text_labels: Convert Signal Text Labels from unicode
to strings.
get_n_signals: Get Number of Channels.
get_signal_freqs: Get Signal Frequencies.
get_n_samples: Get Total Number of Samples.
read_signal: Reads Entire Signal Record and returns the
values as an array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is value to having the documentation close to the relevant code.

EDF.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
EDF.py Outdated Show resolved Hide resolved
EDF.py Outdated
implementation details, check out https://www.edfplus.info/index.html
"""
__version__ = "0.4.0"
__author__ = "https://github.com/robertoostenveld | https://github.com/bsandeepan95>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think traditionally this is name + email address. Note also the distinction between author, credits/contributora, and maintainera.

Copy link
Author

@bsandeepan bsandeepan Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't know robert's mail id so I put it like this. Even in original code, he has not added his email id or anything as far as I have checked. I wished to do it properly using his name as author and ours as contributors but I don't have the necessary resources. Hence I had to wait till this code review.

@robertoostenveld please send me your email id so I can add it properly.
@palday you too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the output of git log locally, you'll see committer email addresses. 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palday your email id is not there.

@palday
Copy link
Collaborator

palday commented Jul 13, 2020

One thing I forgot: write_byte really should be write_bytes because (1) it's usually more than 1 byte of data being written and (2) it's a helper function to write things of the bytes type.

@bsandeepan
Copy link
Author

@palday I am new to this pull requests so I there are some changes that I wish to do right away and commit but it seems there are many such ones and github is providing me a "Commit Change" button. If I use that button, will it make numerous new comments for each change I save, or will it stage them together later when the pull is successful? Seeking your guidance.

@palday
Copy link
Collaborator

palday commented Jul 27, 2020

I think "Commit change" creates a new commit for every single change, but lots of small commits are fine -- especially because they'll all be squashed during the merge.

Thanks for doing this: I know it's a lot of effort, and we do appreciate this. My review is critical, but it's meant to be constructive.

EDF.py Outdated
return (str(buf) + ' ' * num) if (num >= 0) else buf[0:num]


def writebyte(file, content, encoding='utf8'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def writebyte(file, content, encoding='utf8'):
def writebyte(file, content, encoding='ascii'):

writebyte should default to using ASCII encoding because that's what the EDF spec actually requires.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you asking me to do this a year ago. I forgot. Apologies. 😅

EDF.py Outdated
Comment on lines 51 to 65
try:
# Py3 onwards bytes and strings are separate data format
content = bytes(content, encoding)
except TypeError:
# Py2.7 Support
pass
except Exception as e:
print(type(e))
print(str(e))
print(
"If you see this message, please go to " +
"https://github.com/bids-standard/pyedf/issues" +
" and open an issue there regarding this. Thank you!")
finally:
file.write(content)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
# Py3 onwards bytes and strings are separate data format
content = bytes(content, encoding)
except TypeError:
# Py2.7 Support
pass
except Exception as e:
print(type(e))
print(str(e))
print(
"If you see this message, please go to " +
"https://github.com/bids-standard/pyedf/issues" +
" and open an issue there regarding this. Thank you!")
finally:
file.write(content)
if type(content) != bytes:
content = content.encode(encoding)
file.write(content)

(This works in Python 2.7 and Python 3 -- I haven't tested it in older Python 2, but am not interesting in supporting that!)

Also, note that the finally clause here doesn't make much sense: if there has been a conversion error, then you don't want to try to write the bytes out to file. It would be better to simply leave out the extra except Exception as e and finally clauses and have file.write be after the try block. But in this case, there's an even easier solution (see suggestion).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I implemented Error Handling block here, I used finally clause to help people understand the code flow. Since there is only one line left to do after the block, and I know developers who can indent such lines one level inside the except clause thinking it as a mistake on our end. I wish to avoid clearing up such confusion in the future if anyone faces this.

I added the except Exception as e clause to ensure people know what to do if they face such error. I can always do away with this. This is only there for least expected scenarios.

bytes is not a supported type in Python 2.7 so the if clause will not work there. However, as we have already added the support in except TypeError clause, how about we put the content = content.encode(encoding) line in there? This will ensure that encoding is done correctly for Python 2.7.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2.7.17 (default, Jul 20 2020, 15:37:01) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "a"
>>> type(x)
<type 'str'>
>>> type(x) == bytes
True
>>> type(x) == str
True

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. bytes is a synonym for str in Python 2.7.

I mean, if we are to do ascii encoding either way, why not enforce the line in the TypeError block, which will be triggered in Python 2.7 since default str encoding scheme is utf-8.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I ran a bit of code for testing the error handling block in Python 2.7 and I feel the block is necessary. Run the following example -

# -*- coding: utf-8 -*-
enc = 'ascii'
x = 'µ Õ ß'

print x

try:
    y = x.encode(enc)
except Exception as e:
    print("Error Message: " + str(e))
    print("Error Type: " + str(type(e)))
    y = 'fail'

print y
print (x == y)

The output will be like the following -

Python 2.7.16 (default, Jul 13 2019, 16:01:51)
[GCC 8.3.0] on linux2

µ Õ ß
Error Message: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Error Type: <type 'exceptions.UnicodeDecodeError'>
fail
False

So errors will occur if there are non-ascii characters, and the code will fail. Now the question is if it is better to let the code fail, or if we should handle the error gracefully and display it to the console for the developer. Let me know your opinion about this @palday .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palday waiting for your response.

@bsandeepan
Copy link
Author

bsandeepan commented Aug 6, 2020

I think "Commit change" creates a new commit for every single change, but lots of small commits are fine -- especially because they'll all be squashed during the merge.

Thanks for doing this: I know it's a lot of effort, and we do appreciate this. My review is critical, but it's meant to be constructive.

Please do. I am grateful for your review. It is making me rethink some of my past coding choices and shaping my perspective for future work. I may sometimes come out difficult to communicate, but I am trying to improve my own bugs too here. 😄

Strict review is necessary. Else we get unexpected drivers issues that discourage me from using Ubuntu Linux in my AMD Vega Graphics Workstation. >__>

@bsandeepan
Copy link
Author

@palday, check the new commits

@palday
Copy link
Collaborator

palday commented Feb 9, 2021

I need to write some tests to check this, but it's on my radar.

@bsandeepan
Copy link
Author

@palday I know we agreed to add the latest documentations in a later PR, but will it help the tests if I add them in this PR?

@palday palday changed the base branch from master to py3k March 16, 2021 14:37
@palday
Copy link
Collaborator

palday commented Mar 16, 2021

Having gone back through the history of this PR .... I'm somewhat surprised that it didn't take advantage of the work already done in #4. But we can't change the past.

I think the best solution for now is to merge this into a py3k branch and provide documentation for installing from that branch for Python 3 support. I'll discuss with @robertoostenveld how to best package up all the work from #4 and here together and hopefully get that back into the trunk branch and set up a PyPI release.

There seem to be some merge conflicts (which I don't quite understand since py3k currently is the same as master and master didn't have merge conflicts), but I'll have to look at those in a few hours and then I'll merge.

@palday palday merged commit 00eb640 into bids-standard:py3k Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants