-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surfer reader feature #169
Surfer reader feature #169
Conversation
ported the surfer reader from fatiando and changed the output to a xarray Data Array. Need to nail down some tests and get everything formatted
updated index.rst and init files to include the load_surfer function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessepisel thanks for implementing this! I have some suggestions and corrections to some styling of the docstring. We don't need examples here but some tests would be great. Take a look at fatiando/harmonica#11 and fatiando/harmonica#16 by @santisoler for inspiration. They implement and then refactor a function to load data from ICGEM and tests for the function.
I'm going to fix the merge conflicts here so please |
@santisoler could you also take a look and see if you have anything to add or some suggestions? |
Hi @jessepisel, nice work porting it. I think this kind of functions makes a lot easier the migration of our previous workflow on other software (like Surfer in this case) to fatiando. Also, it really helps to integrate ourselves among research groups that still works with this kind of data files. I'll leave some comments on the code. Don't feel discouraged by the proposed changes, they are mainly on styling, which is kind of subjective thing that changes in time, so nothing to worry about. |
Changed the formatting and added a bit to the docstring. Updated the error block from assert to if then raise IOError. Improved readability for creating the DataArray, still need to read in the data with loadtxt as a check for data dimensions.
@leouieda @santisoler Thank you both for the reviews, they were very helpful, I am glad to see I am on the right track! I implemented the easy formatting changes and added other suggested changes. I am going to start working on some tests for this and continue working on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessepisel take your time. There is no rush for this feature.
I agree with @santisoler that this kind of thing helps people migrate. Thanks for taking the time to implement it!
For the tests, you might find it easier to test if you split this function into 2 or 3. I'd recommend putting the header parsing (lines 294-300) into a separate function that takes a list of strings and returns the converted values. This way, you can test this function by giving it strings instead of having to create test files for all cases (which is a bit of a pain). Another part that would fit well in a separate function is the validation of the data (lines 308-312).
Again, see the load_icgem_gdf
function from harmonica for a reference.
Also, it's good to run |
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
this creates a fake header as a list then passes it to a helper function that is the first half of the load_surfer function. It returns the header information after parsing it and compares to the original list of strings. Definitely a WIP as I figure out how to write tests. Any feedback is more than welcome.
added a check to make sure the length of the data read to the field variable is the same as the dimensions read from the header. Also updated as recommended from the last commit
loads a fake header and a fake grid into stringIO and then validates that the header matches the dimensions of the data
Co-Authored-By: jessepisel <jessepisel@users.noreply.github.com>
changed field length to field size
…/verde into surfer_reader_feature
I have some tests for the header parser and grid parser portions of the function in my forked version/surfer branch of verde. Do you want them in a new PR for them or how would you like them for review? |
Hi @jessepisel. Any PR that wants to be merged into By the way, I've seen you added pieces of software belonging to the Surfer reader function into the test functions. Although this may be considered a good way to test the function itself (because this pieces of software are the same), it's a source of problems for the future. For example, if we make any change in the original function that involuntary includes a bug, the test function will not fail because it's testing old code, clearing the path for an undetected bug. So the idea is to call the functions that wants to be tested by importing them first (as you would use them on your notebooks or scripts). See the What @leouieda meant by separating the function is to create two functions in Something like this could be an example: def load_surfer(fname, dtype="float64"):
"""
Read data from a Surfer ASCII grid file.
...
"""
with open(fname) as input_file:
grid_id, ydims, xdims, south, north, west, east, dmin, dmax = _read_surfer_header(fname)
field = np.fromiter(
(float(s) for line in input_file for s in line.split()), dtype=dtype
)
...
return data
def _read_surfer_header(input_file):
# DSAA is a Surfer ASCII GRD ID
grd_id = input_file.readline().strip()
# Read the number of columns (ny) and rows (nx)
ydims, xdims = [int(s) for s in input_file.readline().split()]
# Our x points North, so the first thing we read is y, not x.
south, north = [float(s) for s in input_file.readline().split()]
west, east = [float(s) for s in input_file.readline().split()]
dmin, dmax = [float(s) for s in input_file.readline().split()]
return grid_id, ydims, xdims, south, north, west, east, dmin, dmax The same goes for the validation data lines, for example: def _check_surfer_integrity(...):
err_msg = "{} of data ({}) doesn't match one from file ({})."
if field.size != ydims * xdims:
raise IOError(err_msg.format("Dimensions", (ydims * xdims), len(field)))
if not np.allclose(dmin, field.min()):
raise IOError(err_msg.format("Min", dmin, field.min()))
if not np.allclose(dmax, field.max()):
raise IOError(err_msg.format("Max", dmax, field.max())) We put an underscore before the function name to make clear that it's a private function, i.e. a function that is not supposed to be called by users and it's only used buy functions inside the module to perform internal tasks. Hope my comment remove any doubts you have. Feel free to ask for more help without any problem! |
@santisoler beat me to it 😉 The idea is that "adding tests later" tends to never happen. I know this from personal experience and it's the reason the testing situation in Fatiando was so inconsistent. So with Verde I decided to make this a priority from the start. If you need help setting this up, just let us know. |
Thanks for the detailed explanation @santisoler and @leouieda I will keep plugging away at the tests and see what I can come up with! |
@jessepisel no problem, take your time. Don't hesitate to call on us if you want any help (we can write some code and push to this branch as well). Testing is hard and it takes time to get used to it. |
I updated the function and split it into several sub-functions. I need to reformat it with the new line lengths discussion that I saw this past week. I'll work on some tests this week and have them submitted by the weekend. |
Thanks for the update @jessepisel! Let me know when you're ready and I can jump in to review the code. |
added a helper function to create the field variable. Added unit tests for all functions, gladly take any help on this part. Also worked on fomatting for 80 characters for this function and tests.
I think this is ready for a code review if you have the time to look through it. |
Hi @jessepisel, I'm having a look at your code right now. I think I'll just make some commits with a few fixes right to this PR so we don't delay this much more. I want to have it in the next release. |
That sounds good to me. Let me know what I can do to help. |
Moved the function to a new verde.io module to separate it from the misc utilities. Refactor to use shape, region, etc instead of individual values. Add support for paths or open files (or file-like objects). Use test fixtures and test with and without nans. Test the full function as well as the individual parts. Fix formatting and linting errors.
@jessepisel OK, I refactored the code to be a bit shorter and more concise. I also added a few test options (with/without nans, from StringIO or from file). The function now can take an open file or a file name. Please have a look and see what you think. When coding for one of our projects, make sure you use |
@santisoler would you mind reviewing this? If you're happy with it feel free to squash and merge. When doing that, please edit the commit message to summarize what this PR is contributing. |
I like your changes, it's easier to read with the formatting and it is to the point. Thanks for making those changes to it! |
Ok @leouieda. I'm quite busy today, but I'll leave it for next week. Nice work @jessepisel! I always learn a lot when someone else refactor my codes. It's a great opportunity to acquire more knowledge on coding and software development! |
🎉 🎉 @jessepisel thanks for the contribution! 🎉 🎉 I'll cut a new release as soon as #181 is merged so this will make it into 1.2.0. Thanks @santisoler for reviewing and merging! |
Thanks for all the help @leouieda and @santisoler I learned a lot from this one. I appreciate all the help and guidance on putting this together! |
👍 I hope to see new ones in the future 😉 |
ported the surfer reader from fatiando and changed the output to a xarray Data Array. Ran tests and formatted locally.
I ported the surfer reader feature from fatiando and changed the output from a dict to an xarray data array. Will you look through it to make sure it's what you had in mind for this feature. Do you want me to add examples to the docstring as well? Right now it just has the docstring from fatiando modified as appropriate.
Fixes #38
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
andverde/__init__.py
.