-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add is_potential_mixed_script_confusable_char
function
#13
Add is_potential_mixed_script_confusable_char
function
#13
Conversation
35d49f3
to
1fb9c04
Compare
1fb9c04
to
72cefff
Compare
There's a debug boolean value in the python script (search "debug = False", https://github.com/unicode-rs/unicode-security/pull/13/files#diff-c87c196441d88317a5ea3bf97e9fde0aR536), if anyone's curious why these codepoints are confusable, you can toggle it and regenerate the table file, which will include comments on why these code points are considered mixed script confusable. |
The python calculation part is a little complex there. So i'll leave a brief description. The main idea is creating equivalence classes from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really hard to review the python stuff, it would be nice if each function had a long comment explaining what its purpose is. currently i have to kinda figure that out from the code, and it's hard to review things when you don't know what they're doing
confusables.append((d_input, d_outputs)) | ||
|
||
return confusables | ||
|
||
def aliases(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps link to the original source in unicode-script so things can be manually updated if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments below this line.
@Manishearth Will add some comments, thanks! |
Added some comments. Also cc @pyfisch here. |
dac6a41
to
a18eb9c
Compare
Adjusted the APIs and added a very simple test. |
is_potential_mixed_script_confusable_char
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, would appreciate docs on each function
def is_script_ignored_in_mixedscript(source): | ||
return source == 'Zinh' or source == 'Zyyy' or source == 'Zzzz' | ||
|
||
def process_mixedscript_single_to_multi(item_i, script_i, proto_lst, scripts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this function trying to do? what does it operate on? where did those rules come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to explain in comments.
return True | ||
return False | ||
|
||
def load_potential_mixedscript_confusables(f, identifier_allowed, scripts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this function returns, and what each argument is, should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i will add more comments.
Added more comments to document the details. |
…=Manishearth Implement mixed script confusable lint. This implements the mixed script confusable lint defined in RFC 2457. This is blocked on rust-lang#72069 and unicode-rs/unicode-security#13, and will need a Cargo.toml version bump after those are resolved. The lint message warning is sub-optimal for now. We'll need a mechanism to properly output `AugmentScriptSet` to screen, this is to be added in `unicode-security` crate. r? @Manishearth
…=Manishearth Implement mixed script confusable lint. This implements the mixed script confusable lint defined in RFC 2457. This is blocked on rust-lang#72069 and unicode-rs/unicode-security#13, and will need a Cargo.toml version bump after those are resolved. The lint message warning is sub-optimal for now. We'll need a mechanism to properly output `AugmentScriptSet` to screen, this is to be added in `unicode-security` crate. r? @Manishearth
…=Manishearth Implement mixed script confusable lint. This implements the mixed script confusable lint defined in RFC 2457. This is blocked on rust-lang#72069 and unicode-rs/unicode-security#13, and will need a Cargo.toml version bump after those are resolved. The lint message warning is sub-optimal for now. We'll need a mechanism to properly output `AugmentScriptSet` to screen, this is to be added in `unicode-security` crate. r? @Manishearth
This is a prototype of the data required from mixed_script_confusable lint of rustc. I'm not really sure whether we need to give more detailed data to rustc for diagnostics. (e.g. This code point is potentially confusable with which code point or which script?)Putting those aside, maybe we can have a early review first.
Implements
is_potential_mixed_script_confusable_char
function.A few other issues raised up during adding the actual lint - #15 #16