Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix e164PhoneNumber to only generate valid phone numbers with valid country codes #264

Merged
merged 11 commits into from
Jan 27, 2021
Merged

Conversation

Gummibeer
Copy link

@Gummibeer Gummibeer commented Jan 13, 2021

What is the reason for this PR?

Right now the generated e164 phone numbers aren't valid so they can't be used in libraries like https://github.com/giggsey/libphonenumber-for-php .
This PR lists some valid formats and allows country providers to add their own formats. This way only valid phone numbers are generated.

  • A new feature
  • Fixed an issue (resolve #ID)

Author's checklist

Summary of changes

Add a list of valid e164 formats to generate valid phone numbers.

Review checklist

  • All checks have passed
  • Changes are approved by maintainer

@pimjansen
Copy link

Imo this can still generate invalid locale data? We now fixed a subset of country prefixes however the area codes are still country specific and not even mentioned.

There i think this should be a localized issue and fixed in a specific provider to match the codes for those countries.

Technically the original implementation matches the spec as far as i can see however if a external service will validate the country of area code it could return an invalid response.

@Gummibeer
Copy link
Author

@pimjansen only for countries with more rules/restrictions than the leading country calling code.
As the code adjustment allows to define specific e164 formats per country provider these specific cases can be added. I've put several (10.000) of the default generated numbers through the phone lib and all were accepted.

public function testE164PhoneNumberFormat()
{
    $number = $this->faker->e164PhoneNumber();
    self::assertMatchesRegularExpression('/^\+[1-9][0-9]{10,}$/', $number);

    PhoneNumberUtil::getInstance()->parse($number);
}
$ vendor/bin/simple-phpunit --filter testE164PhoneNumberFormat --repeat 10000

Time: 3.13 minutes, Memory: 30.00 MB

OK (10000 tests, 10000 assertions)

If wanted I can add the phone lib as a dev-dependency to make the test more solid - but I think as long as all providers are part of the same repo/package it would be too much to require external dependencies only to validate the generated data!?

@Gummibeer
Copy link
Author

Following several sources it's indirectly part of the spec to only use valid country codes as the first one to three digits are reserved for the country code - as it's explicitly called the country/calling code it's implicitly requiring a valid one.
https://www.itu.int/rec/T-REC-E.164-201011-I/en
https://www.twilio.com/docs/glossary/what-e164
https://en.wikipedia.org/wiki/E.164

@krsriq
Copy link

krsriq commented Jan 13, 2021

$ vendor/bin/simple-phpunit --filter testE164PhoneNumberFormat --repeat 10000

Faker will always generate the same phone number unless you comment out

$this->faker->seed(1);

...test is still green though 🙂️

/**
* @see https://github.com/giggsey/libphonenumber-for-php/blob/master/src/CountryCodeToRegionCodeMapForTesting.php
*/
protected static $e164Formats = [
Copy link

@krsriq krsriq Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the calling codes for de_AT and de_CH are missing while de_DE and fr_FR are included?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default array is only copied from the linked source above.
There are even more countries missing but I think that this list is big enough for testing and every country provider can define the specific ones.

@pimjansen
Copy link

Following several sources it's indirectly part of the spec to only use valid country codes as the first one to three digits are reserved for the country code - as it's explicitly called the country/calling code it's implicitly requiring a valid one.
https://www.itu.int/rec/T-REC-E.164-201011-I/en
https://www.twilio.com/docs/glossary/what-e164
https://en.wikipedia.org/wiki/E.164

If we want to modify this i would like to see all possible variations on the country codes available here.

@Gummibeer
Copy link
Author

Okay, so will make this PR complete. Just wanted to get quick first feedback. 😊

@Gummibeer
Copy link
Author

Gummibeer commented Jan 13, 2021

Could be that I've gone a bit over the top but the generated numbers are okay now. 🙈😂
Disabled the mentioned "seed" line and added this assertion to all E164 tests

self::assertInstanceOf(
    \libphonenumber\PhoneNumber::class,
    \libphonenumber\PhoneNumberUtil::getInstance()->parse($number)
);

Result with 100.000 repetitions is that everything looks good. 🙂

$ vendor/bin/simple-phpunit --filter testE164PhoneNumberFormat --repeat 100000

OK (700000 tests, 1300000 assertions)

Copy link

@krsriq krsriq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!
I cross-referenced the $e164Formats list with the Wikipedia entry for calling codes - there are 3 entries missing but they all are special cases with good reasons not to be included (such as +379 for the Vatican).

Note that the phone numbers generated with this PR still are not necessarily valid numbers, they just have a valid country prefix - as can be tested with the isValidNumber method of the linked library libphonenumber. This is still an improvement though.

@Gummibeer
Copy link
Author

I have no idea why the coverage test is failing as there's a test for every added/changed method. So in theory the percentage coverage should go up!? 🤔
But as it's the only failing test and so far I see every change is covered in a test I think it's fine to ignore this test.

@pimjansen
Copy link

No worries about the coverage. Its based on previous commit

59.05% (-0.01%) compared to 41f699e

As you can see is lower but fine.

{
$number = $this->faker->e164PhoneNumber();
self::assertMatchesRegularExpression('/^\+[0-9]{11,}$/', $number);
self::assertMatchesRegularExpression('/^\+[1-9][0-9]{10,}$/', $number);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krsriq wouldnt it be better here to run x number of iterations with unique? Since this way we actually just test a single number right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's wanted I can change this. It's the way it was tested before.
The "official" regex would be ^\+[1-9]\d{1,14}$ anyway, so if I change the tests I would also change the regex to match the e164 standard and not what we generate - as in theory numbers can be shorter than CC+10digits and also have an upper bound.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the regex: we could also change the generation logic to randomize the number of digits after CC.
But I haven't found a way to do so in an easy way as we would have to randomize the number of # like str_repeat('#', mt_rand(8, 12)) which doesn't work with the format is.
Somewhere I've seen that numerify() somewhen supported ! to put in a digit or nothing but this doesn't work anymore. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the regex assertions to test for E164 standard and not for the current generator.
a159368
This way the test can remain the same if somewhen someone wants to improve the generator even more and/or randomize the number of digits after CC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pimjansen I've checked some other tests and this is done in some other phone tests already for the "normal" phone number.
So I've added the for loops - 1000 runs for the large one to get a lot of different CC prefixes and 10 per country provider as these only have a single format.
ac76e29

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krsriq wouldnt it be better here to run x number of iterations with unique? Since this way we actually just test a single number right?

I don't think unique adds any value here, since it's already highly unlikely that a number with ten digits is generated twice (even with 1000 iterations).

for ($i = 0; $i < 1000; ++$i) {
$number = $this->faker->e164PhoneNumber();
self::assertMatchesRegularExpression('/^\+[1-9]\d{1,14}$/', $number);
self::assertLessThanOrEqual(16, strlen($number)); // +\d{2,15}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of having this comment here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining why 16 as the original E164 is maxed 15 but we have the plus-sign before.
I can remove it but I know how hard it is to understand these numbers sometime later

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about writing something like '+ sign and max. 15 digits'? Not having to parse a regex decreases the cognitive load required somewhat IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! 👍
f93d97b

@pimjansen pimjansen merged commit 14657fa into FakerPHP:main Jan 27, 2021
@pimjansen
Copy link

Thanks for contributing both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants