Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YTEP-0036.rst: converting from nose to pytest #9

Merged
merged 8 commits into from
Mar 19, 2021
Merged

YTEP-0036.rst: converting from nose to pytest #9

merged 8 commits into from
Mar 19, 2021

Conversation

jcoughlin11
Copy link

This PR covers the addition of YTEP-0036.rst. This new YTEP discusses changing yt's testing framework from nose to pytest, changing the testing architecture to remove yield tests, and using stored hashes of answer arrays instead of storing the entire answer array itself. In addition, it also details how to run the tests under pytest and how to write new answer tests in the new framework.

… moving from stored arrays to stored array hashes.
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is really nice, thank you for working on this. This will make it much easier for people to run answer tests locally. See my comment below about reducing boilerplate - I think we want to make it as easy as possible to write new answer tests and I think it's worthwhile to think about insulating test authors from having to know about how hashes should be computed (and possibly screwing that up).

One of the nice aspects of the old answer testing system - especially for the plot answer tests - is that we had rich comparisons for the old answers and the new answers. For plot answer tests we even use matplotlib's image comparison machinery to generate a diff image which people can use to help figure out what exactly is changing, this is very useful in debugging. For the GitHub CI, it also generates a little HTML document that you can download as a test artifact that includes the before, after, and diff image.

Is that maintained here? I would guess that since we'd only be storing the image hashes we wouldn't be able to do that anymore. It might be nice to think about ways to continue storing the images - we're already putting them in the repo as-is (although with this it might make sense to offload the data storage to a secondary repo that only stores the images so we don't continue increasing the size of the yt repository).

In general it would be nice to have more discussion here in this document about the plot answer tests, since those are a little special and also got a lot of love during the google summer of code 2018.

The before/after comparison issue (e.g. a test fails and a developer wants to quickly understand what exactly changed) is less of a concern for the field values tests because we never really did anything to make it easy for developers to see exactly what changed between the new and old answers. That said, it might be nice to think about ways to ergonomically tell developers how to investigate failing answer tests, e.g. by somehow printing out a test script as well as git hashes to run the test script on to see the before/after values and before/after hashes explicitly. I could imagine some hacky ways to do this (e.g. by inspecting the source code for the test functions) but don't really have any concrete ideas - just a thought!


.. code-block:: bash

$ pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still only run the unit tests or does it run the answer tests too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see below it doesn't. Maybe note that here explicitly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for d in dobjs:
hashes['field_values'][f][d] = utils.generate_hash(self.field_values_test(ds, f, d))
hashes = {'test_method1': hashes}
utils.handle_hashes(self.save_dir, self.answer_file, hashes, self.answer_store)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a lot of this code is going to get repeated across all the answer tests in all the frontends. Might it be possible for the AnswerTest class to handle allocating the hashes dict, hashing the data, and storing the hashes for each test?

@matthewturk
Copy link
Member

@jcoughlin11 I agree with what @ngoldbaum said about the image tests in particular, but I wonder if we could also extend this somewhat to all of the tests. As it stands, we are just storing the hashes, but most of the hashing is computed in a central location. Would it be possible to supply arrays to that central location, and have an option that stores the array values (even just in raw binary!) as well as checks the hashes? That might help preserve both options.

It would also open up the opportunity for someone who is making changes and finding test errors to run the tests with that option (--store-values or something) and then load them in to manually compare in detail against the new tests that are failing.

@matthewturk
Copy link
Member

(Of course this could then be extended to save actual image files, etc etc.)

@munkm
Copy link
Member

munkm commented Oct 10, 2019

I've been thinking a bit more about this (specifically related to the image diffs and how nice the gsoc work is that supports it). I agree that we should do our best to preserve that functionality, and ideally make sure the upload feature of the diff and the new data is preserved in CI as well. I'm not exactly sure how much more difficult it would be to supply arrays to the central location, but what if we had an option that if a test fails with the hash it then generates the diff and the image files? We could minimize storage needs by only generating these diffs if the test fails, but still make it so we can quickly identify issues with new features.

One downside to this would be that if this is written where a test requires a rerun to generate the diffs, we could end up wasting a lot of resources running and re-running tests. But maybe that won't be an issue at all.

@ngoldbaum
Copy link
Member

One thing that hasn't come up yet that might be relevant for this discussion is that github actions are a thing now and they have generous free compute resources associated with them.

…ametrizing tests, no longer using OrderedDicts, and reworking the hashing fixture.
@munkm munkm requested a review from neutrinoceros February 26, 2021 19:48
Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent work. There's no question that I'm on board with the whole project but I listed a handful of questions I would like addressed, as well as a couple minor suggestions for improvements. Keep it up !

source/YTEPs/YTEP-0036.rst Outdated Show resolved Hide resolved
The second proposal of this YTEP is a general reorganization and streamlining for yt's answer testing framework.

Currently, many answer tests in yt generate large arrays of data. In order to facilitate comparison, these arrays need to be saved. Unfortunately, their size necessitates that these answers be stored separately from the main yt code base and makes answer comparisons slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Synchronising pull-requests merging with two repos instead of one also slows down the development itself and creates technical debt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcoughlin11 this comment is still not marked "outdated". Maybe it simply doesn't fit the doc anymore, in which case you can simply resolve this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait it still applies. You can reject it if you want, no hard feelings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added your suggestion as the third bullet point here. I can make it its own paragraph if you think that would fit better.


Currently, the answer tests are mostly implemented as yield tests using nose. Pytest no longer supports yield tests because they resulted in multiple test functions being generated (one for each parameter combination), and these generated test functions did not properly support fixtures, as described `here <https://docs.pytest.org/en/latest/deprecations.html#yield-tests>`_.

The first step in reorganizing the tests is, therefore, to remove all yield tests. Each function that previously yielded a test now properly calls a test function with the desired parameters, where assertions are made. This makes the tests easier to read, as most of the boilerplate associated with yielding is gone (e.g., changing the ``__name__`` attribute of the yielded test class).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each function that previously yielded a test now properly calls a test function with the desired parameters

I'm confused, it sounds more like an intermediate step than a desirable final state: shouldn't test function be self-contained ? I believe that test generation is roughly equivalent to @pytest.mark.parametrize, is there something I'm missing ?


The first step in reorganizing the tests is, therefore, to remove all yield tests. Each function that previously yielded a test now properly calls a test function with the desired parameters, where assertions are made. This makes the tests easier to read, as most of the boilerplate associated with yielding is gone (e.g., changing the ``__name__`` attribute of the yielded test class).

The second step is to organize the answer tests into classes (e.g., all of the enzo answer tests go in a class called ``TestEnzo``). This not only provides a natural organizational structure within the code itself, but it also provides a simple way of ensuring that all related answer tests get saved to the same answer file. The class structure also facilitates easier running of specific tests, especially in the case where answer tests and unit tests share the same module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all related answer tests get saved to the same answer file

why is this important ?

The class structure also facilitates easier running of specific tests

I don't really get this last argument. pytest offers several ways to select a subset of test at runtime already based on names (pytest -k) or markers (pytest -m). Why do you think it's not enough ?


The second step is to organize the answer tests into classes (e.g., all of the enzo answer tests go in a class called ``TestEnzo``). This not only provides a natural organizational structure within the code itself, but it also provides a simple way of ensuring that all related answer tests get saved to the same answer file. The class structure also facilitates easier running of specific tests, especially in the case where answer tests and unit tests share the same module.

As an extension of the above, each answer test (e.g., ``FieldValuesTest``) that was previously a class of its own is now a method (e.g., ``field_values_test``) of the base ``AnswerTest`` class, which resides in ``yt/utilities/answer_testing/framework.py`` and takes the place of the ``AnswerTestingTest`` class. The use of this base class provides a central location for any new answer tests that might be added in the future. Every class that performs answer testing (e.g. ``TestEnzo``) inherits from ``AnswerTest``, thereby giving it access to the methods of ``AnswerTest``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be reading this wrong. Are you saying that those tests live in the framework.py ?

Upon reading a third time, I think the source of my confusion is the phrasing of the first sentence. Specifically

each answer test (e.g., FieldValuesTest)

I'm sure you mean something like "abstract answer test" here (I don't know of a better word here, sorry). In any case there's a distinction between generic/abstract tests and actual tests classes/functions, right ?
I think it should be clarified here and in the following.

source/YTEPs/YTEP-0036.rst Outdated Show resolved Hide resolved
source/YTEPs/YTEP-0036.rst Outdated Show resolved Hide resolved
# This is in yt/utilities/answer_testing.framework.py
class AnswerTest:
# The other methods are here. New method goes below
def my_new_answer_test(self, test_param1, test_param2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the test discovery rules ? I'm wondering if my_new_answer_test is actually a "valid" test name according to pytest's default rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part of the doc this comment applied too has been removed, but my question hasn't been answered (unless the answer is in the new version, but I'm not looking at it right now. In this case, please resolve this thread @jcoughlin11).

# Do stuff
return result

Any new frontend must have its tests registered and assigned an answer file. This should be done by updating ``yt_repo/tests/tests.yaml`` with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurray, that's a small doc PR we can do !

source/YTEPs/YTEP-0036.rst Outdated Show resolved Hide resolved
@jcoughlin11
Copy link
Author

@neutrinoceros Thanks for the comments! I think I addressed all of them (let me know if that's not the case)! I also significantly reworked the document to reflect the infrastructure changes I made since this ytep was first written. I also removed a lot of the material that felt like a re-hashing of the pytest documentation.

@neutrinoceros
Copy link
Member

I tried resolving as many threads as I could @jcoughlin11, but I don't have time to read the whole new version again right now so I'll need to come back to this later. Thank you for simplifying the document, I look forward to proof reading again !

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks even nicer than the last time I read it, well played !
I would like to add a small contribution that I think fits this very well and hasn't been discussed yet:
we have a lot of docstrings embedding example snippet suited for doctest, but we currently have no CI for those. Coincidently, it is extremely easy to make pytest run these along normal tests
(see https://docs.pytest.org/en/stable/doctest.html).
I expect this is a third subproject that can fit this YTEP. It will likely not be straightforward because many tests may be a little rotten, but I'm sure it's worth to do it.
Can you add a short paragraph (2 to 3 sentences should be enough ?) for this ?
Otherwise, after my remaining points have been adressed I'll be happy to approve this !

The second proposal of this YTEP is a general reorganization and streamlining for yt's answer testing framework.

Currently, many answer tests in yt generate large arrays of data. In order to facilitate comparison, these arrays need to be saved. Unfortunately, their size necessitates that these answers be stored separately from the main yt code base and makes answer comparisons slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait it still applies. You can reject it if you want, no hard feelings.

source/YTEPs/YTEP-0036.rst Outdated Show resolved Hide resolved
@matthewturk
Copy link
Member

So I think this is good to go. I'm approving! Anyone want to object to merging?

@munkm
Copy link
Member

munkm commented Mar 16, 2021

I'm confused how all the unrelated YTEPS are touched in this PR? There should only be one file changed.

@matthewturk
Copy link
Member

@jcoughlin11 if you're OK with me pushing to your branch I can fix what @munkm mentions, I think.

@matthewturk
Copy link
Member

@munkm I think @jcoughlin11 fixed it with this

@neutrinoceros
Copy link
Member

So I think this is good to go. I'm approving! Anyone want to object to merging?

✋🏻 I would like my last suggestion answered, even if not implemented.

@jcoughlin11
Copy link
Author

@neutrinoceros Are you referring to your comment about doctest integration? If so, I added a section to the ytep called Doctest Integration that briefly covers this. Was there something else I was missing?

@neutrinoceros
Copy link
Member

@jcoughlin11 sorry my bad, I didn't see it (I was on my phone, it's harder to spot on a small screen)

Cool ! I figure we don't need to write down that the command line option --doctest-glob can be "adopted" in a config file, since it's not specific to this particular option.
However, could you also mention (or link to) this : https://docs.pytest.org/en/stable/doctest.html?highlight=doctest#doctest-namespace-fixture ? It seems like a pretty big deal to me since it allow to skip >>> import yt in doctest docstrings, which I remember adding in some places just to make doctest happy.

@Xarthisius Xarthisius merged commit 686afa8 into yt-project:master Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants