Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.7] bpo-31677: Backport regex used to match encoded-word strings #7856

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions Lib/email/header.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,11 @@
=\? # literal =?
(?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
\? # literal ?
(?P<encoding>[qb]) # either a "q" or a "b", case insensitive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK for this part, but...

(?P<encoding>[qQbB]) # either a "q" or a "b", case insensitive
\? # literal ?
(?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
\?= # literal ?=
(?=[ \t]|$) # whitespace or the end of the string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this part and changes for tests are OK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, this PR was a while back. This is really the issue here but was hidden when I copied the regex used in Python 3.x. Instead, I've reverted the noise so it's clear that it is this.

''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
''', re.VERBOSE | re.MULTILINE)

# Field name regexp, including trailing colon, but not separating whitespace,
# according to RFC 2822. Character range is from tilde to exclamation mark.
Expand Down
6 changes: 4 additions & 2 deletions Lib/email/test/test_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -1649,10 +1649,12 @@ def test_whitespace_eater_unicode_2(self):
hu = make_header(dh).__unicode__()
eq(hu, u'The quick brown fox jumped over the lazy dog')

def test_rfc2047_without_whitespace(self):
def test_rfc2047_missing_whitespace(self):
s = 'Sm=?ISO-8859-1?B?9g==?=rg=?ISO-8859-1?B?5Q==?=sbord'
dh = decode_header(s)
self.assertEqual(dh, [(s, None)])
self.assertEqual(dh, [(b'Sm', None), (b'\xf6', 'iso-8859-1'),
(b'rg', None), (b'\xe5', 'iso-8859-1'),
(b'sbord', None)])

def test_rfc2047_with_whitespace(self):
s = 'Sm =?ISO-8859-1?B?9g==?= rg =?ISO-8859-1?B?5Q==?= sbord'
Expand Down
4 changes: 3 additions & 1 deletion Lib/email/test/test_email_renamed.py
Original file line number Diff line number Diff line change
Expand Up @@ -1586,7 +1586,9 @@ def test_whitespace_eater_unicode_2(self):
def test_rfc2047_missing_whitespace(self):
s = 'Sm=?ISO-8859-1?B?9g==?=rg=?ISO-8859-1?B?5Q==?=sbord'
dh = decode_header(s)
self.assertEqual(dh, [(s, None)])
self.assertEqual(dh, [(b'Sm', None), (b'\xf6', 'iso-8859-1'),
(b'rg', None), (b'\xe5', 'iso-8859-1'),
(b'sbord', None)])

def test_rfc2047_with_whitespace(self):
s = 'Sm =?ISO-8859-1?B?9g==?= rg =?ISO-8859-1?B?5Q==?= sbord'
Expand Down