Skip to content

In-development PHP library to aid in comparing strings for visual similarity. Aims to become more reductive and much more comprehensive than the official Unicode confusables list. Contribute a unicode block today!

Notifications You must be signed in to change notification settings

dliebner/php-unhomoglyph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

php-unhomoglpyh

This is an incomplete project. Feel free to fork and develop.

Usage

require_once('Unhomoglyph.php');

Unhomoglyph::init('homoglyphCharmaps/extended.php');

$str1 = 'google.com';
$str2 = 'gοοgle.com'; // o's are actually u+03BF (lowercase omicron)
if( $str1 !== $str && Unhomoglyph::skeleton($str1) === Unhomoglyph::skeleton($str2) ) {

	echo "WARNING! $str1 looks like $str2 but they are NOT the same";

}

The Character Map

Full lookalike/homoglyph character map is in homoglyphCharmaps/extended.php. It's organized into unicode blocks which may help you if you only need to focus on certain ranges. A TODO is to automatically break this into separate files using code generation and only include blocks as needed.

Donate a Unicode Block!

The base of the character map is originally based on https://github.com/nodeca/unhomoglyph which is based on http://www.unicode.org/Public/security/latest/confusables.txt. However, this map left a lot to be desired, and I've gone through the first 5,000 unicode characters or so manually and updated the mapping. Unicode has over 150,000 characters at the time of this writing. See a Unicode Block that needs improvement? Donate it with a pull request!

Tools

Unhomoglyph::renderCharmapTable() - Renders a table of character => mapped character Unhomoglyph::exportUpdatedOrganizedCharmap() - Sorts and re-organizes extended charmap, generating updated return array
Unhomoglyph::exportInverseGroupedCharmap() - Generates an array of skeleton character => homoglyphs
Unhomoglyph::wikipediaUnicodeBlockTableParserApp() - Tool to generate Unhomoglyph::$blockRanges from Wikipedia

About

In-development PHP library to aid in comparing strings for visual similarity. Aims to become more reductive and much more comprehensive than the official Unicode confusables list. Contribute a unicode block today!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages