-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix e164PhoneNumber to only generate valid phone numbers with valid country codes #264
Conversation
Imo this can still generate invalid locale data? We now fixed a subset of country prefixes however the area codes are still country specific and not even mentioned. There i think this should be a localized issue and fixed in a specific provider to match the codes for those countries. Technically the original implementation matches the spec as far as i can see however if a external service will validate the country of area code it could return an invalid response. |
@pimjansen only for countries with more rules/restrictions than the leading country calling code. public function testE164PhoneNumberFormat()
{
$number = $this->faker->e164PhoneNumber();
self::assertMatchesRegularExpression('/^\+[1-9][0-9]{10,}$/', $number);
PhoneNumberUtil::getInstance()->parse($number);
}
If wanted I can add the phone lib as a dev-dependency to make the test more solid - but I think as long as all providers are part of the same repo/package it would be too much to require external dependencies only to validate the generated data!? |
Following several sources it's indirectly part of the spec to only use valid country codes as the first one to three digits are reserved for the country code - as it's explicitly called the country/calling code it's implicitly requiring a valid one. |
Faker will always generate the same phone number unless you comment out Line 18 in d74df14
...test is still green though 🙂️ |
/** | ||
* @see https://github.com/giggsey/libphonenumber-for-php/blob/master/src/CountryCodeToRegionCodeMapForTesting.php | ||
*/ | ||
protected static $e164Formats = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why the calling codes for de_AT and de_CH are missing while de_DE and fr_FR are included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default array is only copied from the linked source above.
There are even more countries missing but I think that this list is big enough for testing and every country provider can define the specific ones.
If we want to modify this i would like to see all possible variations on the country codes available here. |
Okay, so will make this PR complete. Just wanted to get quick first feedback. 😊 |
Could be that I've gone a bit over the top but the generated numbers are okay now. 🙈😂 self::assertInstanceOf(
\libphonenumber\PhoneNumber::class,
\libphonenumber\PhoneNumberUtil::getInstance()->parse($number)
); Result with 100.000 repetitions is that everything looks good. 🙂
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now!
I cross-referenced the $e164Formats
list with the Wikipedia entry for calling codes - there are 3 entries missing but they all are special cases with good reasons not to be included (such as +379 for the Vatican).
Note that the phone numbers generated with this PR still are not necessarily valid numbers, they just have a valid country prefix - as can be tested with the isValidNumber
method of the linked library libphonenumber. This is still an improvement though.
I have no idea why the coverage test is failing as there's a test for every added/changed method. So in theory the percentage coverage should go up!? 🤔 |
No worries about the coverage. Its based on previous commit 59.05% (-0.01%) compared to 41f699e As you can see is lower but fine. |
{ | ||
$number = $this->faker->e164PhoneNumber(); | ||
self::assertMatchesRegularExpression('/^\+[0-9]{11,}$/', $number); | ||
self::assertMatchesRegularExpression('/^\+[1-9][0-9]{10,}$/', $number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krsriq wouldnt it be better here to run x number of iterations with unique? Since this way we actually just test a single number right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's wanted I can change this. It's the way it was tested before.
The "official" regex would be ^\+[1-9]\d{1,14}$
anyway, so if I change the tests I would also change the regex to match the e164 standard and not what we generate - as in theory numbers can be shorter than CC+10digits and also have an upper bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the regex: we could also change the generation logic to randomize the number of digits after CC.
But I haven't found a way to do so in an easy way as we would have to randomize the number of #
like str_repeat('#', mt_rand(8, 12))
which doesn't work with the format is.
Somewhere I've seen that numerify()
somewhen supported !
to put in a digit or nothing but this doesn't work anymore. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the regex assertions to test for E164 standard and not for the current generator.
a159368
This way the test can remain the same if somewhen someone wants to improve the generator even more and/or randomize the number of digits after CC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pimjansen I've checked some other tests and this is done in some other phone tests already for the "normal" phone number.
So I've added the for
loops - 1000 runs for the large one to get a lot of different CC prefixes and 10 per country provider as these only have a single format.
ac76e29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krsriq wouldnt it be better here to run x number of iterations with unique? Since this way we actually just test a single number right?
I don't think unique adds any value here, since it's already highly unlikely that a number with ten digits is generated twice (even with 1000 iterations).
test for standard not for current generation
for ($i = 0; $i < 1000; ++$i) { | ||
$number = $this->faker->e164PhoneNumber(); | ||
self::assertMatchesRegularExpression('/^\+[1-9]\d{1,14}$/', $number); | ||
self::assertLessThanOrEqual(16, strlen($number)); // +\d{2,15} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of having this comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explaining why 16
as the original E164 is maxed 15 but we have the plus-sign before.
I can remove it but I know how hard it is to understand these numbers sometime later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about writing something like '+ sign and max. 15 digits'? Not having to parse a regex decreases the cognitive load required somewhat IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! 👍
f93d97b
Thanks for contributing both! |
What is the reason for this PR?
Right now the generated e164 phone numbers aren't valid so they can't be used in libraries like https://github.com/giggsey/libphonenumber-for-php .
This PR lists some valid formats and allows country providers to add their own formats. This way only valid phone numbers are generated.
Author's checklist
Summary of changes
Add a list of valid e164 formats to generate valid phone numbers.
Review checklist