Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surfer reader feature #169

Merged
merged 23 commits into from
Apr 26, 2019
Merged

Surfer reader feature #169

merged 23 commits into from
Apr 26, 2019

Conversation

jessepisel
Copy link
Contributor

ported the surfer reader from fatiando and changed the output to a xarray Data Array. Ran tests and formatted locally.

I ported the surfer reader feature from fatiando and changed the output from a dict to an xarray data array. Will you look through it to make sure it's what you had in mind for this feature. Do you want me to add examples to the docstring as well? Right now it just has the docstring from fatiando modified as appropriate.

Fixes #38

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst and verde/__init__.py.
  • Write detailed docstrings for all functions/classes/methods.
  • If adding new functionality, add an example to the docstring, gallery, and/or tutorials.

ported the surfer reader from fatiando and changed the output to a xarray Data Array. Need to nail down some tests and get everything formatted
updated index.rst and init files to include the load_surfer function
Copy link
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessepisel thanks for implementing this! I have some suggestions and corrections to some styling of the docstring. We don't need examples here but some tests would be great. Take a look at fatiando/harmonica#11 and fatiando/harmonica#16 by @santisoler for inspiration. They implement and then refactor a function to load data from ICGEM and tests for the function.

verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
@leouieda
Copy link
Member

I'm going to fix the merge conflicts here so please git pull the new commit before making any changes.

@leouieda
Copy link
Member

@santisoler could you also take a look and see if you have anything to add or some suggestions?

@santisoler
Copy link
Member

Hi @jessepisel, nice work porting it. I think this kind of functions makes a lot easier the migration of our previous workflow on other software (like Surfer in this case) to fatiando. Also, it really helps to integrate ourselves among research groups that still works with this kind of data files.

I'll leave some comments on the code. Don't feel discouraged by the proposed changes, they are mainly on styling, which is kind of subjective thing that changes in time, so nothing to worry about.

verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
Changed the formatting and added a bit to the docstring. Updated the error block from assert to if then raise IOError. Improved readability for creating the DataArray, still need to read in the data with loadtxt as a check for data dimensions.
@jessepisel
Copy link
Contributor Author

@leouieda @santisoler Thank you both for the reviews, they were very helpful, I am glad to see I am on the right track! I implemented the easy formatting changes and added other suggested changes. I am going to start working on some tests for this and continue working on the numpy.loadtxt issues I have been having. Hopefully I will have a PR in the next few days.

Copy link
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessepisel take your time. There is no rush for this feature.

I agree with @santisoler that this kind of thing helps people migrate. Thanks for taking the time to implement it!

For the tests, you might find it easier to test if you split this function into 2 or 3. I'd recommend putting the header parsing (lines 294-300) into a separate function that takes a list of strings and returns the converted values. This way, you can test this function by giving it strings instead of having to create test files for all cases (which is a bit of a pain). Another part that would fit well in a separate function is the validation of the data (lines 308-312).

Again, see the load_icgem_gdf function from harmonica for a reference.

verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
@leouieda
Copy link
Member

Also, it's good to run make format before committing to avoid errors in Travis CI due to code formatting. make check and make lint also give good warnings (usually) about possible flaws in the code.

leouieda and others added 9 commits February 20, 2019 08:32
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
this creates a fake header as a list then passes it to a helper function that is the first half of the load_surfer function. It returns the header information after parsing it and compares to the original list of strings. Definitely a WIP as I figure out how to write tests. Any feedback is more than welcome.
added a check to make sure the length of the data read to the field variable is the same as the dimensions read from the header. Also updated as recommended from the last commit
loads a fake header and a fake grid into stringIO and then validates that the header matches the dimensions of the data
verde/utils.py Outdated Show resolved Hide resolved
leouieda and others added 3 commits February 27, 2019 15:07
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
changed field length to field size
@jessepisel
Copy link
Contributor Author

I have some tests for the header parser and grid parser portions of the function in my forked version/surfer branch of verde. Do you want them in a new PR for them or how would you like them for review?

@santisoler
Copy link
Member

Hi @jessepisel. Any PR that wants to be merged into master must have 100% test coverage. So when any new enhancement is incorporated into the PR, it should be accompanied with its corresponding tests functions. This means that every test function for your new Surfer grid reader function must be included on this PR.

By the way, I've seen you added pieces of software belonging to the Surfer reader function into the test functions. Although this may be considered a good way to test the function itself (because this pieces of software are the same), it's a source of problems for the future. For example, if we make any change in the original function that involuntary includes a bug, the test function will not fail because it's testing old code, clearing the path for an undetected bug.

So the idea is to call the functions that wants to be tested by importing them first (as you would use them on your notebooks or scripts). See the load_icgem function from Harmonica and its corresponding test functions to get an idea of what I'm saying.

What @leouieda meant by separating the function is to create two functions in verde/utils.py: one called load_surfer() and the other one could be _read_surfer_header(). The idea is to leave the header reading task to the latter, while the first one only calls it to get the header information.

Something like this could be an example:

def load_surfer(fname, dtype="float64"):
    """
    Read data from a Surfer ASCII grid file.
    ...
    """
    with open(fname) as input_file:
        grid_id, ydims, xdims, south, north, west, east, dmin, dmax = _read_surfer_header(fname)
        field = np.fromiter(
            (float(s) for line in input_file for s in line.split()), dtype=dtype
        )
    ...
    return data


def _read_surfer_header(input_file):
    # DSAA is a Surfer ASCII GRD ID
    grd_id = input_file.readline().strip()
    # Read the number of columns (ny) and rows (nx)
    ydims, xdims = [int(s) for s in input_file.readline().split()]
    # Our x points North, so the first thing we read is y, not x.
    south, north = [float(s) for s in input_file.readline().split()]
    west, east = [float(s) for s in input_file.readline().split()]
    dmin, dmax = [float(s) for s in input_file.readline().split()]
    return grid_id, ydims, xdims, south, north, west, east, dmin, dmax

The same goes for the validation data lines, for example:

def _check_surfer_integrity(...):
    err_msg = "{} of data ({}) doesn't match one from file ({})."
    if field.size != ydims * xdims:
        raise IOError(err_msg.format("Dimensions", (ydims * xdims), len(field)))
    if not np.allclose(dmin, field.min()):
        raise IOError(err_msg.format("Min", dmin, field.min()))
    if not np.allclose(dmax, field.max()):
        raise IOError(err_msg.format("Max", dmax, field.max()))

We put an underscore before the function name to make clear that it's a private function, i.e. a function that is not supposed to be called by users and it's only used buy functions inside the module to perform internal tasks.

Hope my comment remove any doubts you have. Feel free to ask for more help without any problem!

@leouieda
Copy link
Member

leouieda commented Mar 2, 2019

@santisoler beat me to it 😉

The idea is that "adding tests later" tends to never happen. I know this from personal experience and it's the reason the testing situation in Fatiando was so inconsistent. So with Verde I decided to make this a priority from the start. If you need help setting this up, just let us know.

@jessepisel
Copy link
Contributor Author

Thanks for the detailed explanation @santisoler and @leouieda I will keep plugging away at the tests and see what I can come up with!

@leouieda
Copy link
Member

leouieda commented Mar 7, 2019

@jessepisel no problem, take your time. Don't hesitate to call on us if you want any help (we can write some code and push to this branch as well). Testing is hard and it takes time to get used to it.

@jessepisel
Copy link
Contributor Author

I updated the function and split it into several sub-functions. I need to reformat it with the new line lengths discussion that I saw this past week. I'll work on some tests this week and have them submitted by the weekend.

@leouieda
Copy link
Member

Thanks for the update @jessepisel! Let me know when you're ready and I can jump in to review the code.

added a helper function to create the field variable. Added unit tests for all functions, gladly take any help on this part. Also worked on fomatting for 80 characters for this function and tests.
@jessepisel
Copy link
Contributor Author

I think this is ready for a code review if you have the time to look through it.

@leouieda
Copy link
Member

Hi @jessepisel, I'm having a look at your code right now. I think I'll just make some commits with a few fixes right to this PR so we don't delay this much more. I want to have it in the next release.

@jessepisel
Copy link
Contributor Author

That sounds good to me. Let me know what I can do to help.

Moved the function to a new verde.io module to separate it from the misc
utilities.
Refactor to use shape, region, etc instead of individual values.
Add support for paths or open files (or file-like objects).
Use test fixtures and test with and without nans.
Test the full function as well as the individual parts.
Fix formatting and linting errors.
@leouieda
Copy link
Member

@jessepisel OK, I refactored the code to be a bit shorter and more concise. I also added a few test options (with/without nans, from StringIO or from file). The function now can take an open file or a file name. Please have a look and see what you think.

When coding for one of our projects, make sure you use make check, make lint, and make format often to catch some possible errors. I run these all the time and it really helps to find things like unused variables and imports, invalid syntax, etc.

@leouieda
Copy link
Member

@santisoler would you mind reviewing this? If you're happy with it feel free to squash and merge. When doing that, please edit the commit message to summarize what this PR is contributing.

@jessepisel
Copy link
Contributor Author

I like your changes, it's easier to read with the formatting and it is to the point. Thanks for making those changes to it!

@santisoler santisoler self-requested a review April 26, 2019 13:15
@santisoler
Copy link
Member

@santisoler would you mind reviewing this? If you're happy with it feel free to squash and merge. When doing that, please edit the commit message to summarize what this PR is contributing.

Ok @leouieda. I'm quite busy today, but I'll leave it for next week.

Nice work @jessepisel! I always learn a lot when someone else refactor my codes. It's a great opportunity to acquire more knowledge on coding and software development!

@santisoler santisoler merged commit 4af1b32 into fatiando:master Apr 26, 2019
@leouieda
Copy link
Member

🎉 🎉 @jessepisel thanks for the contribution! 🎉 🎉

I'll cut a new release as soon as #181 is merged so this will make it into 1.2.0.

Thanks @santisoler for reviewing and merging!

@jessepisel
Copy link
Contributor Author

Thanks for all the help @leouieda and @santisoler I learned a lot from this one. I appreciate all the help and guidance on putting this together!

@leouieda
Copy link
Member

👍 I hope to see new ones in the future 😉

@jessepisel jessepisel deleted the surfer_reader_feature branch April 29, 2019 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Function to read in a grid in Surfer format
3 participants