-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YTEP-0036.rst: converting from nose to pytest #9
Conversation
… moving from stored arrays to stored array hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is really nice, thank you for working on this. This will make it much easier for people to run answer tests locally. See my comment below about reducing boilerplate - I think we want to make it as easy as possible to write new answer tests and I think it's worthwhile to think about insulating test authors from having to know about how hashes should be computed (and possibly screwing that up).
One of the nice aspects of the old answer testing system - especially for the plot answer tests - is that we had rich comparisons for the old answers and the new answers. For plot answer tests we even use matplotlib's image comparison machinery to generate a diff image which people can use to help figure out what exactly is changing, this is very useful in debugging. For the GitHub CI, it also generates a little HTML document that you can download as a test artifact that includes the before, after, and diff image.
Is that maintained here? I would guess that since we'd only be storing the image hashes we wouldn't be able to do that anymore. It might be nice to think about ways to continue storing the images - we're already putting them in the repo as-is (although with this it might make sense to offload the data storage to a secondary repo that only stores the images so we don't continue increasing the size of the yt repository).
In general it would be nice to have more discussion here in this document about the plot answer tests, since those are a little special and also got a lot of love during the google summer of code 2018.
The before/after comparison issue (e.g. a test fails and a developer wants to quickly understand what exactly changed) is less of a concern for the field values tests because we never really did anything to make it easy for developers to see exactly what changed between the new and old answers. That said, it might be nice to think about ways to ergonomically tell developers how to investigate failing answer tests, e.g. by somehow printing out a test script as well as git hashes to run the test script on to see the before/after values and before/after hashes explicitly. I could imagine some hacky ways to do this (e.g. by inspecting the source code for the test functions) but don't really have any concrete ideas - just a thought!
|
||
.. code-block:: bash | ||
|
||
$ pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still only run the unit tests or does it run the answer tests too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see below it doesn't. Maybe note that here explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
source/YTEPs/YTEP-0036.rst
Outdated
for d in dobjs: | ||
hashes['field_values'][f][d] = utils.generate_hash(self.field_values_test(ds, f, d)) | ||
hashes = {'test_method1': hashes} | ||
utils.handle_hashes(self.save_dir, self.answer_file, hashes, self.answer_store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a lot of this code is going to get repeated across all the answer tests in all the frontends. Might it be possible for the AnswerTest
class to handle allocating the hashes
dict, hashing the data, and storing the hashes for each test?
@jcoughlin11 I agree with what @ngoldbaum said about the image tests in particular, but I wonder if we could also extend this somewhat to all of the tests. As it stands, we are just storing the hashes, but most of the hashing is computed in a central location. Would it be possible to supply arrays to that central location, and have an option that stores the array values (even just in raw binary!) as well as checks the hashes? That might help preserve both options. It would also open up the opportunity for someone who is making changes and finding test errors to run the tests with that option ( |
(Of course this could then be extended to save actual image files, etc etc.) |
I've been thinking a bit more about this (specifically related to the image diffs and how nice the gsoc work is that supports it). I agree that we should do our best to preserve that functionality, and ideally make sure the upload feature of the diff and the new data is preserved in CI as well. I'm not exactly sure how much more difficult it would be to supply arrays to the central location, but what if we had an option that if a test fails with the hash it then generates the diff and the image files? We could minimize storage needs by only generating these diffs if the test fails, but still make it so we can quickly identify issues with new features. One downside to this would be that if this is written where a test requires a rerun to generate the diffs, we could end up wasting a lot of resources running and re-running tests. But maybe that won't be an issue at all. |
One thing that hasn't come up yet that might be relevant for this discussion is that github actions are a thing now and they have generous free compute resources associated with them. |
…ametrizing tests, no longer using OrderedDicts, and reworking the hashing fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent work. There's no question that I'm on board with the whole project but I listed a handful of questions I would like addressed, as well as a couple minor suggestions for improvements. Keep it up !
The second proposal of this YTEP is a general reorganization and streamlining for yt's answer testing framework. | ||
|
||
Currently, many answer tests in yt generate large arrays of data. In order to facilitate comparison, these arrays need to be saved. Unfortunately, their size necessitates that these answers be stored separately from the main yt code base and makes answer comparisons slow. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synchronising pull-requests merging with two repos instead of one also slows down the development itself and creates technical debt. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcoughlin11 this comment is still not marked "outdated". Maybe it simply doesn't fit the doc anymore, in which case you can simply resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait it still applies. You can reject it if you want, no hard feelings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added your suggestion as the third bullet point here. I can make it its own paragraph if you think that would fit better.
source/YTEPs/YTEP-0036.rst
Outdated
|
||
Currently, the answer tests are mostly implemented as yield tests using nose. Pytest no longer supports yield tests because they resulted in multiple test functions being generated (one for each parameter combination), and these generated test functions did not properly support fixtures, as described `here <https://docs.pytest.org/en/latest/deprecations.html#yield-tests>`_. | ||
|
||
The first step in reorganizing the tests is, therefore, to remove all yield tests. Each function that previously yielded a test now properly calls a test function with the desired parameters, where assertions are made. This makes the tests easier to read, as most of the boilerplate associated with yielding is gone (e.g., changing the ``__name__`` attribute of the yielded test class). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each function that previously yielded a test now properly calls a test function with the desired parameters
I'm confused, it sounds more like an intermediate step than a desirable final state: shouldn't test function be self-contained ? I believe that test generation is roughly equivalent to @pytest.mark.parametrize
, is there something I'm missing ?
source/YTEPs/YTEP-0036.rst
Outdated
|
||
The first step in reorganizing the tests is, therefore, to remove all yield tests. Each function that previously yielded a test now properly calls a test function with the desired parameters, where assertions are made. This makes the tests easier to read, as most of the boilerplate associated with yielding is gone (e.g., changing the ``__name__`` attribute of the yielded test class). | ||
|
||
The second step is to organize the answer tests into classes (e.g., all of the enzo answer tests go in a class called ``TestEnzo``). This not only provides a natural organizational structure within the code itself, but it also provides a simple way of ensuring that all related answer tests get saved to the same answer file. The class structure also facilitates easier running of specific tests, especially in the case where answer tests and unit tests share the same module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all related answer tests get saved to the same answer file
why is this important ?
The class structure also facilitates easier running of specific tests
I don't really get this last argument. pytest offers several ways to select a subset of test at runtime already based on names (pytest -k
) or markers (pytest -m
). Why do you think it's not enough ?
source/YTEPs/YTEP-0036.rst
Outdated
|
||
The second step is to organize the answer tests into classes (e.g., all of the enzo answer tests go in a class called ``TestEnzo``). This not only provides a natural organizational structure within the code itself, but it also provides a simple way of ensuring that all related answer tests get saved to the same answer file. The class structure also facilitates easier running of specific tests, especially in the case where answer tests and unit tests share the same module. | ||
|
||
As an extension of the above, each answer test (e.g., ``FieldValuesTest``) that was previously a class of its own is now a method (e.g., ``field_values_test``) of the base ``AnswerTest`` class, which resides in ``yt/utilities/answer_testing/framework.py`` and takes the place of the ``AnswerTestingTest`` class. The use of this base class provides a central location for any new answer tests that might be added in the future. Every class that performs answer testing (e.g. ``TestEnzo``) inherits from ``AnswerTest``, thereby giving it access to the methods of ``AnswerTest``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be reading this wrong. Are you saying that those tests live in the framework.py
?
Upon reading a third time, I think the source of my confusion is the phrasing of the first sentence. Specifically
each answer test (e.g.,
FieldValuesTest
)
I'm sure you mean something like "abstract answer test" here (I don't know of a better word here, sorry). In any case there's a distinction between generic/abstract tests and actual tests classes/functions, right ?
I think it should be clarified here and in the following.
source/YTEPs/YTEP-0036.rst
Outdated
# This is in yt/utilities/answer_testing.framework.py | ||
class AnswerTest: | ||
# The other methods are here. New method goes below | ||
def my_new_answer_test(self, test_param1, test_param2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the test discovery rules ? I'm wondering if my_new_answer_test
is actually a "valid" test name according to pytest's default rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part of the doc this comment applied too has been removed, but my question hasn't been answered (unless the answer is in the new version, but I'm not looking at it right now. In this case, please resolve this thread @jcoughlin11).
source/YTEPs/YTEP-0036.rst
Outdated
# Do stuff | ||
return result | ||
|
||
Any new frontend must have its tests registered and assigned an answer file. This should be done by updating ``yt_repo/tests/tests.yaml`` with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hurray, that's a small doc PR we can do !
@neutrinoceros Thanks for the comments! I think I addressed all of them (let me know if that's not the case)! I also significantly reworked the document to reflect the infrastructure changes I made since this ytep was first written. I also removed a lot of the material that felt like a re-hashing of the pytest documentation. |
I tried resolving as many threads as I could @jcoughlin11, but I don't have time to read the whole new version again right now so I'll need to come back to this later. Thank you for simplifying the document, I look forward to proof reading again ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks even nicer than the last time I read it, well played !
I would like to add a small contribution that I think fits this very well and hasn't been discussed yet:
we have a lot of docstrings embedding example snippet suited for doctest, but we currently have no CI for those. Coincidently, it is extremely easy to make pytest run these along normal tests
(see https://docs.pytest.org/en/stable/doctest.html).
I expect this is a third subproject that can fit this YTEP. It will likely not be straightforward because many tests may be a little rotten, but I'm sure it's worth to do it.
Can you add a short paragraph (2 to 3 sentences should be enough ?) for this ?
Otherwise, after my remaining points have been adressed I'll be happy to approve this !
The second proposal of this YTEP is a general reorganization and streamlining for yt's answer testing framework. | ||
|
||
Currently, many answer tests in yt generate large arrays of data. In order to facilitate comparison, these arrays need to be saved. Unfortunately, their size necessitates that these answers be stored separately from the main yt code base and makes answer comparisons slow. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait it still applies. You can reject it if you want, no hard feelings.
So I think this is good to go. I'm approving! Anyone want to object to merging? |
I'm confused how all the unrelated YTEPS are touched in this PR? There should only be one file changed. |
@jcoughlin11 if you're OK with me pushing to your branch I can fix what @munkm mentions, I think. |
@munkm I think @jcoughlin11 fixed it with this |
✋🏻 I would like my last suggestion answered, even if not implemented. |
@neutrinoceros Are you referring to your comment about doctest integration? If so, I added a section to the ytep called Doctest Integration that briefly covers this. Was there something else I was missing? |
@jcoughlin11 sorry my bad, I didn't see it (I was on my phone, it's harder to spot on a small screen) Cool ! I figure we don't need to write down that the command line option |
This PR covers the addition of YTEP-0036.rst. This new YTEP discusses changing yt's testing framework from nose to pytest, changing the testing architecture to remove yield tests, and using stored hashes of answer arrays instead of storing the entire answer array itself. In addition, it also details how to run the tests under pytest and how to write new answer tests in the new framework.