Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test randomness #125

Open
1 task
IonBazan opened this issue Dec 9, 2020 · 4 comments
Open
1 task

Test randomness #125

IonBazan opened this issue Dec 9, 2020 · 4 comments
Labels
bug Something isn't working pinned

Comments

@IonBazan
Copy link

IonBazan commented Dec 9, 2020

Summary

This topic was brought up by @krsriq in #82 and partially addressed in #90. This issue is to discuss possible solutions to make sure our tests are properly checking the library behavior.

Versions

Version
PHP ALL
fakerphp/faker main

Self-enclosed code snippet for reproduction

Tests are now seeded with 1 to assure test result consistency but that opens another potential issue:

public function testCreditCardTypeReturnsValidVendorName()
{
self::assertContains($this->faker->creditCardType, ['Visa', 'Visa Retired', 'MasterCard', 'American Express', 'Discover Card']);
}

In this code, $this->faker->creditCardType will always return MasterCard because generator is seeded with 1 before each test method execution so removing the last element in array will still make the test pass every time.

Possible solutions

While testing random data generation in a reproducible way is difficult, we should make sure our tests are working properly.

Retry tests

I have tried experimenting with PHPUnit --repeat 100 flag to repeat the test several times but setUp (and therefore seed(1)) is called before each repeat too.

Mark tests that require seeding

Another approach would be to introduce a custom @seed <int> annotation for test methods or classes that explicitly require seeding before test to get specific results:
https://github.com/FakerPHP/Faker/blob/main/test/Faker/Provider/ja_JP/InternetTest.php and https://github.com/FakerPHP/Faker/blob/main/test/Faker/Provider/uk_UA/PersonTest.php

This approach together with --repeat 100 flag in one of our test matrix should allow us to make sure all the tests are actually making sure that generated data is always correct.

Other notes

  • How to make sure that each element is returned at least once after N repetitions?
@IonBazan IonBazan mentioned this issue Dec 11, 2020
4 tasks
@pimjansen pimjansen added the bug Something isn't working label Dec 24, 2020
@stale
Copy link

stale bot commented Jan 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 1 week if no further activity occurs. Thank you for your contributions.

@pimjansen
Copy link

@localheinz also noticed we could use the unique generator to exhaust the resource pool. Problem is that this will consume a lot of RAM though.

Im not a fan of just looping hundreds of times. I fact we know for sure we can not guarantee this 100% at this point however also the risk of issues is pretty low right?

@localheinz
Copy link
Member

@IonBazan

Any test that tests that asserts that a provider, given a specific seed of the generator, returns a specific value is bound to fail when the randomization engine changes - see #691, for example.

The question is, what are we going to do when the randomization engine changes? Fix all the failing tests by adjusting the expectations? Seems painful.

For example, I do not see any value in asserting that $faker->email() returns a specific email address. I would not recommend to anyone using fakerphp/faker to rely on $faker->email() to return a specific value. All that should matter to them is that the returned value is semantically correct, that is, that the returned value is an email address.

On a separate note, the --repeat option has been removed from phpunit/phpunit:10.0.0.

@curry684
Copy link

curry684 commented Jan 10, 2024

You should differentiate between what you are testing for in a library like this. Because indeed:

For example, I do not see any value in asserting that $faker->email() returns a specific email address.

Not just no value, it's downright wrong to test like that, as you are testing something that, when it fails, does not imply anything is wrong with the code. If it returns a@example.org today and foo@bar.com tomorrow the code is still working fine - it returned an email address. The only proper test for the email function is $this->assertSame($email, filter_var($email, VALIDATE_EMAIL));

It does however make sense to test repeatable seeded determinism. email is NOT required to always return the same email address between runs, on different computers or operating systems. It is however required to return same email address when run with the same seed in the same run in the same environment. So yes, it does make sense to get a random seed value during testing and use that to test repeated calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pinned
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants