Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`gsub!': invalid byte sequence in UTF-8 (ArgumentError) #224

Closed
chahn opened this issue Sep 28, 2021 · 6 comments · Fixed by #226
Closed

`gsub!': invalid byte sequence in UTF-8 (ArgumentError) #224

chahn opened this issue Sep 28, 2021 · 6 comments · Fixed by #226

Comments

@chahn
Copy link

chahn commented Sep 28, 2021

While parsing some emails following error occurs with Sisimai 4.25.11 and 5-beta2 on ruby 3.0.2p107:

/home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai/mime.rb:331:in `gsub!': invalid byte sequence in UTF-8 (ArgumentError)
        from /home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai/mime.rb:331:in `breaksup'
        from /home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai/mime.rb:439:in `makeflat'
        from /home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai/message.rb:267:in `parse'
        from /home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai/message.rb:80:in `initialize'
        from /home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai.rb:34:in `new'
        from /home/chahn/bounce/vendor/bundle/ruby/3.0.0/gems/sisimai-4.25.11/lib/sisimai.rb:34:in `make'
        from import.rb:38:in `<main>'

This issue seems very related to #137 but here the cause seems to be in a Base64 encoded part of the email body.
The part's content type is text/plain; charset=ISO-8859-1 (not UTF-8 as expected by gsub!?). The encoded content has a special character.

You can reproduce the error with following email:

From MAILER-DAEMON  Mon Sep 20 19:33:02 2021
Return-Path: <>
X-Original-To: postmaster@blackhole.our-host.net
Delivered-To: blackhole@localhost
Received: from mx2.itnetwork.net (mx2.itnetwork.net [123.123.123.202])
	by play.thmthlayer.com (Postfix) with ESMTPS id 94598E117B
	for <postmaster@blackhole.our-host.net>; Mon, 20 Sep 2021 19:33:02 +0000 (UTC)
Received: from 10.9.37.234 ([10.9.34.13]) by mx2.itnetwork.net with ESMTP id rTR3cr9xLoMiR5vX for <postmaster@blackhole.our-host.net>; Mon, 20 Sep 2021 21:33:01 +0200 (CEST)
Date: Mon, 20 Sep 2021 21:32:59 +0200 (GMT+02:00)
From: Postmaster@bit-onbreeeck.org
To: MyName <message@gelaneeiuet.org>
Subject: foobar
Mime-Version: 1.0
Message-ID: <OFB3228DED.DDDCAC8E-ONC1258756.006B648D-C1258756.006B6493@bit-onbreeeck.org>
Content-Type: multipart/report; report-type=delivery-status; boundary="==IFJRGLKFGIR7891042UHRUHIHD"

--==IFJRGLKFGIR7891042UHRUHIHD
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: base64

ICBhdWZnZWb8aHJ0DQoNCg==

--==IFJRGLKFGIR7891042UHRUHIHD
Content-Type: message/delivery-status

Reporting-MTA: dns;10.9.37.234

Final-Recipient: rfc822;jane.doe@some-domain.net
Action: failed
Status: 5.0.0
Diagnostic-Code: The email account that you tried to reach does not exist.

--==IFJRGLKFGIR7891042UHRUHIHD--

Thanks a lot and kind regard from Bavaria,

@azumakuniyuki
Copy link
Member

thanks for the detailed bug report. I'll try to fix this issue within a few days

thanks :-)

azumakuniyuki added a commit that referenced this issue Sep 28, 2021
@azumakuniyuki
Copy link
Member

Fixed code returns the following result:

% ruby -Ilib -rsisimai -e 'puts Sisimai.dump($*.shift)' ./issue-224.eml | jq
[
  {
    "catch": "",
    "token": "d4a360a80e6125e2ca458d36514b54ba2ff5a156",
    "lhost": "",
    "rhost": "10.9.37.234",
    "alias": "",
    "listid": "",
    "reason": "filtered",
    "action": "failed",
    "origin": "./issue-224.eml",
    "subject": "",
    "messageid": "",
    "replycode": "",
    "smtpagent": "RFC3464",
    "softbounce": 1,
    "smtpcommand": "",
    "destination": "some-domain.net",
    "senderdomain": "gelaneeiuet.org",
    "feedbacktype": "",
    "diagnosticcode": "The email account that you tried to reach does not exist.",
    "diagnostictype": "",
    "deliverystatus": "5.0.0",
    "timezoneoffset": "+0200",
    "addresser": "message@gelaneeiuet.org",
    "recipient": "jane.doe@some-domain.net",
    "timestamp": 1632166379
  }
]

This bug may have been fixed, perhaps.

@chahn
Copy link
Author

chahn commented Sep 28, 2021

Thanks a lot @azumakuniyuki, your fix is working for me as well. Great work! :-)

Kind regards from Bavaria,

@azumakuniyuki
Copy link
Member

@chahn Thanks for the quick response. Would you mind if we add the email you post at this issue to set-of-emails repository?

Best regards, :-)

@chahn
Copy link
Author

chahn commented Sep 28, 2021

@azumakuniyuki Yes of course you can use this email. Headers and content are anonymized therefore.

@azumakuniyuki
Copy link
Member

@chahn Thank you so much. I'll add the file and commit it soon.

Thanks again :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants