Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hg: add email validation for hg users (bug 1691635) #217

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

grottohub
Copy link
Contributor

while moz-phab will hopefully catch invalid emails before they make it to lando now, this PR adds an extra safeguard against incorrectly configured mercurial users' commits landing in a repo

Bug 1691635 - Lando landed a commit without a valid email address

@grottohub grottohub requested a review from zzzeid September 15, 2022 20:28
@grottohub grottohub marked this pull request as ready for review September 15, 2022 20:28
landoapi/hg.py Outdated
@@ -519,3 +523,9 @@ def read_checkout_file(self, path: str) -> str:

with checkout_file_path.open() as f:
return f.read()

@staticmethod
def extract_email_from_username(username: Union[str, bytes]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
def extract_email_from_username(username: Union[str, bytes]) -> str:
def extract_email_from_username(username: str | bytes) -> str:

landoapi/hg.py Outdated
if not user:
raise ValueError("Missing `User` header!")

email = self.extract_email_from_username(user)
if not is_valid_email(email):
raise ValueError("Invalid email configured for Mercurial user!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to include the email address here in the output.

landoapi/hg.py Outdated
def extract_email_from_username(username: Union[str, bytes]) -> str:
"""Extracts an email from a Mercurial username, if it exists.
Not guaranteed to return a valid email, make sure to validate."""
return str(username).split("<").pop().replace(">", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a regular expression. It would be good to add a unit test for this method as well.

@@ -20,3 +20,11 @@ def test_convertion_failure_string(id):
def test_convertion_failure_integer():
with pytest.raises(TypeError):
revision_id_to_int(123)


def test_is_valid_email():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More test cases can be added here (for example test <test> and test <test@test>. And perhaps a few malformed ones.

@@ -6,6 +6,7 @@
from connexion import ProblemException

REVISION_ID_RE = re.compile(r"^D(?P<id>[1-9][0-9]*)$")
VALID_EMAIL_RE = re.compile(r"[^@ \t\r\n]+@[^@ \t\r\n]+\.[^@ \t\r\n]+")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a misnomer since it could catch invalid email addresses too. Maybe ACCEPTED_EMAIL_RE is a better name. There probably is a more suitable regex that is available on the internet.

An example of an invalid email that will pass the above: -@...



def is_valid_email(email: str) -> bool:
"""Given a string, determines if it is a valid email."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably put the regex pattern here, and describe in more detail in the docstring what it's doing.

landoapi/hg.py Outdated
Comment on lines 529 to 530
"""Extracts an email from a Mercurial username, if it exists.
Not guaranteed to return a valid email, make sure to validate."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See pep 257.

Suggested change
"""Extracts an email from a Mercurial username, if it exists.
Not guaranteed to return a valid email, make sure to validate."""
"""Extracts an email from a Mercurial username, if it exists.
Not guaranteed to return a valid email, make sure to validate."""

@grottohub grottohub requested a review from zzzeid September 28, 2022 17:15
Copy link
Contributor

@zzzeid zzzeid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I would leave the comments here explicitly (though we chatted about them a while ago over Zoom). Main comment is re: the use of replace.

def extract_email_from_username(username: str | bytes) -> str:
"""Extracts an email from a Mercurial username, if it exists.

Not guaranteed to return a valid email, make sure to validate."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Not guaranteed to return a valid email, make sure to validate."""
Not guaranteed to return a valid email, make sure to validate.
"""

Comment on lines +536 to +538
email = search(r"<.*?>", str(username))
if email:
return email.group(0).replace("<", "").replace(">", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified/cleaned up using grouping.

Comment on lines +46 to +47
assert True not in [is_valid_email(value) for value in invalid_emails]
assert False not in [is_valid_email(value) for value in valid_emails]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more of a personal preference than anything else, but could be a little more human readable.

Suggested change
assert True not in [is_valid_email(value) for value in invalid_emails]
assert False not in [is_valid_email(value) for value in valid_emails]
assert not any(is_valid_email(value) for value in invalid_emails)
assert all(is_valid_email(value) for value in valid_emails)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants