Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow regex in Sisimai/Reason/UserUnknown.pm #411

Closed
azumakuniyuki opened this issue Sep 25, 2020 · 2 comments
Closed

Slow regex in Sisimai/Reason/UserUnknown.pm #411

azumakuniyuki opened this issue Sep 25, 2020 · 2 comments

Comments

@azumakuniyuki
Copy link
Member

azumakuniyuki commented Sep 25, 2020

Sisimai::Reason::UserUnknown->match is very slow or hangs up when the 1st argument is long long string. This issue reported by Martin Kluge. The following test scripts and teststring.txt were provided by Martin Kluge, Thanks.

$ cat ./teststring.txt | perl ./old.pl
Matching
^C <- HANGS UP !

$ cat ./teststring.txt | perl ./new.pl
Matching
0

old.pl

#!/usr/bin/perl
use v5.10;
my $str = '';
while(<STDIN>) { chomp; $str .= $_; }

print "Matching\n";
print match(0, $str);

sub match {
    my $class = shift;
    my $argv1 = shift // return undef;

    state $regex = qr{(?>
         .+[ ]user[ ]unknown
        |[#]5[.]1[.]1[ ]bad[ ]address
        |[<].+[>][ ]not[ ]found
        |[<].+[@].+[>][.][.][.][ ]blocked[ ]by[ ]
        |5[.]0[.]0[.][ ]mail[ ]rejected[.]
        |5[.]1[.]0[ ]address[ ]rejected[.]
        |adresse[ ]d[ ]au[ ]moins[ ]un[ ]destinataire[ ]invalide.+[a-z]{3}.+(?:416|418)
        |address[ ](?:does[ ]not[ ]exist|unknown)
        |archived[ ]recipient
        |bad[-_ \t]recipient
        |can[']t[ ]accept[ ]user
        |destination[ ](?:
             addresses[ ]were[ ]unknown
            |server[ ]rejected[ ]recipients
            )
        |email[ ]address[ ](?:does[ ]not[ ]exist|could[ ]not[ ]be[ ]found)
        |invalid[ ](?:
             address
            |mailbox:
            |mailbox[ ]path|recipient
            )
        |is[ ]not[ ](?:
             a[ ]known[ ]user
            |a[ ]valid[ ]mailbox
            |an[ ]active[ ]address[ ]at[ ]this[ ]host
            )
        |mailbox[ ](?:
             .+[ ]does[ ]not[ ]exist
            |.+[@].+[ ]unavailable
            |does[ ]not[ ]exist
            |invalid
            |is[ ](?:inactive|unavailable)
            |not[ ](?:present|found)
            |unavailable
            )
        |no[ ](?:
             [ ].+[ ]in[ ]name[ ]directory
            |account[ ]by[ ]that[ ]name[ ]here
            |existe[ ](?:dicha[ ]persona|ese[ ]usuario[ ])
            |mail[ ]box[ ]available[ ]for[ ]this[ ]user
            |mailbox[ ](?:
                 by[ ]that[ ]name[ ]is[ ]currently[ ]available
                |found
                )
            |matches[ ]to[ ]nameserver[ ]query
            |such[ ](?:
                 address[ ]here
                |mailbox
                |person[ ]at[ ]this[ ]address
                |recipient
                |user(?:[ ]here)?
                )
            |thank[ ]you[ ]rejected:[ ]account[ ]unavailable:
            |valid[ ]recipients[,][ ]bye    # Microsoft
            )
        |non[- ]?existent[ ]user
        |not[ ](?:
             a[ ]valid[ ](?:recipient|user[ ]here)
            |a[ ]local[ ]address
            |email[ ]addresses
            )
        |rcpt[ ][<].+[>][ ]does[ ]not[ ]exist
        |recipient[ ]address[ ]rejected[.][ ][(]in[ ]reply[ ]to[ ]rcpt[ ]to[ ]command[)]
        |rece?ipient[ ](?:
             .+[ ]was[ ]not[ ]found[ ]in
            |address[ ]rejected:[ ](?:
                 access[ ]denied
                |invalid[ ]user
                |user[ ].+[ ]does[ ]not[ ]exist
                |user[ ]unknown[ ]in[ ].+[ ]table
                |unknown[ ]user
                )
            |does[ ]not[ ]exist(?:[ ]on[ ]this[ ]system)?
            |is[ ]not[ ]local
            |not[ ](?:exist|found|ok)
            |unknown
            )
        |requested[ ]action[ ]not[ ]taken:[ ]mailbox[ ]unavailable
        |resolver[.]adr[.]recip(?:ient)notfound # Microsoft
        |said:[ ]550[-[ ]]5[.]1[.]1[ ].+[ ]user[ ]unknown[ ]
        |smtp[ ]error[ ]from[ ]remote[ ]mail[ ]server[ ]after[ ]end[ ]of[ ]data:[ ]553.+does[ ]not[ ]exist
        |sorry,[ ](?:
             user[ ]unknown
            |badrcptto
            |no[ ]mailbox[ ]here[ ]by[ ]that[ ]name
            )
        |the[ ](?:
             email[ ]account[ ]that[ ]you[ ]tried[ ]to[ ]reach[ ]does[ ]not[ ]exist
            |following[ ]recipients[ ]was[ ]undeliverable
            |user[']s[ ]email[ ]name[ ]is[ ]not[ ]found
            )
        |there[ ]is[ ]no[ ]one[ ]at[ ]this[ ]address
        |this[ ](?:
             address[ ]no[ ]longer[ ]accepts[ ]mail
            |email[ ]address[ ]is[ ]wrong[ ]or[ ]no[ ]longer[ ]valid
            |spectator[ ]does[ ]not[ ]exist
            |user[ ]doesn[']?t[ ]have[ ]a[ ].+[ ]account
            )
        |unknown[ ](?:
             e[-]?mail[ ]address
            |local[- ]part
            |mailbox
            |recipient
            |user
            )
        |user[ ](?:
             .+[ ]was[ ]not[ ]found
            |.+[ ]does[ ]not[ ]exist
            |does[ ]not[ ]exist
            |missing[ ]home[ ]directory
            |not[ ](?:active|exist|found|known)
            |unknown
            )
        |vdeliver:[ ]invalid[ ]or[ ]unknown[ ]virtual[ ]user
        |your[ ]envelope[ ]recipient[ ]is[ ]in[ ]my[ ]badrcptto[ ]list
        )
    }x;
    return 1 if $argv1 =~ $regex;
    return 0;
}

new.pl

#!/usr/bin/perl
use v5.10;
my $str = '';
while(<STDIN>) { chomp; $str .= $_; }

print "Matching\n";
print match(0, $str);

sub match {
    # Try to match that the given text and regular expressions
    # @param    [String] argv1  String to be matched with regular expressions
    # @return   [Integer]       0: Did not match
    #                           1: Matched
    # @since v4.0.0
    my $class = shift;
    my $argv1 = shift // return undef;

    state $regex = qr{(
         [ ]user[ ]unknown
        |[#]5[.]1[.]1[ ]bad[ ]address
        |[<][^>]+[>][ ]not[ ]found
        |[<][^@]+[@][^>]+[>][.][.][.][ ]blocked[ ]by[ ]
        |5[.]0[.]0[.][ ]mail[ ]rejected[.]
        |5[.]1[.]0[ ]address[ ]rejected[.]
        |adresse[ ]d[ ]au[ ]moins[ ]un[ ]destinataire[ ]invalide.+[a-z]{3}.+(?:416|418)
        |address[ ](does[ ]not[ ]exist|unknown)
        |archived[ ]recipient
        |bad[-_ \t]recipient
        |can[']t[ ]accept[ ]user
        |destination[ ](
             addresses[ ]were[ ]unknown
            |server[ ]rejected[ ]recipients
            )
        |email[ ]address[ ](does[ ]not[ ]exist|could[ ]not[ ]be[ ]found)
        |invalid[ ](
             address
            |mailbox:
            |mailbox[ ]path|recipient
            )
        |is[ ]not[ ](
             a[ ]known[ ]user
            |a[ ]valid[ ]mailbox
            |an[ ]active[ ]address[ ]at[ ]this[ ]host
            )
        |mailbox[ ](
             [ ]does[ ]not[ ]exist
            |[^@]+[@][^ ]+[ ]unavailable
            |does[ ]not[ ]exist
            |invalid
            |is[ ](inactive|unavailable)
            |not[ ](present|found)
            |unavailable
            )
        |no[ ](
             [ ]in[ ]name[ ]directory
            |account[ ]by[ ]that[ ]name[ ]here
            |existe[ ](dicha[ ]persona|ese[ ]usuario[ ])
            |mail[ ]box[ ]available[ ]for[ ]this[ ]user
            |mailbox[ ](
                 by[ ]that[ ]name[ ]is[ ]currently[ ]available
                |found
                )
            |matches[ ]to[ ]nameserver[ ]query
            |such[ ](
                 address[ ]here
                |mailbox
                |person[ ]at[ ]this[ ]address
                |recipient
                |user([ ]here)?
                )
            |thank[ ]you[ ]rejected:[ ]account[ ]unavailable:
            |valid[ ]recipients[,][ ]bye    # Microsoft
            )
        |non[- ]?existent[ ]user
        |not[ ](
             a[ ]valid[ ](recipient|user[ ]here)
            |a[ ]local[ ]address
            |email[ ]addresses
            )
        |rcpt[ ][<][^>]+[>][ ]does[ ]not[ ]exist
        |recipient[ ]address[ ]rejected[.][ ][(]in[ ]reply[ ]to[ ]rcpt[ ]to[ ]command[)]
        |rece?ipient[ ](
             .+[ ]was[ ]not[ ]found[ ]in
            |address[ ]rejected:[ ](
                 access[ ]denied
                |invalid[ ]user
                |user[ ][^ ]+[ ]does[ ]not[ ]exist
                |user[ ]unknown[ ]in[ ][^ ]+[ ]table
                |unknown[ ]user
                )
            |does[ ]not[ ]exist([ ]on[ ]this[ ]system)?
            |is[ ]not[ ]local
            |not[ ](exist|found|ok)
            |unknown
            )
        |requested[ ]action[ ]not[ ]taken:[ ]mailbox[ ]unavailable
        |resolver[.]adr[.]recip(ient)notfound # Microsoft
        |said:[ ]550[-[ ]]5[.]1[.]1[ ].+[ ]user[ ]unknown[ ]
        |smtp[ ]error[ ]from[ ]remote[ ]mail[ ]server[ ]after[ ]end[ ]of[ ]data:[ ]553.+does[ ]not[ ]exist
        |sorry,[ ](
             user[ ]unknown
            |badrcptto
            |no[ ]mailbox[ ]here[ ]by[ ]that[ ]name
            )
        |the[ ](
             email[ ]account[ ]that[ ]you[ ]tried[ ]to[ ]reach[ ]does[ ]not[ ]exist
            |following[ ]recipients[ ]was[ ]undeliverable
            |user[']s[ ]email[ ]name[ ]is[ ]not[ ]found
            )
        |there[ ]is[ ]no[ ]one[ ]at[ ]this[ ]address
        |this[ ](
             address[ ]no[ ]longer[ ]accepts[ ]mail
            |email[ ]address[ ]is[ ]wrong[ ]or[ ]no[ ]longer[ ]valid
            |spectator[ ]does[ ]not[ ]exist
            |user[ ]doesn[']?t[ ]have[ ]a[ ].+[ ]account
            )
        |unknown[ ](
             e[-]?mail[ ]address
            |local[- ]part
            |mailbox
            |recipient
            |user
            )
        |user[ ](
             [ ]was[ ]not[ ]found
            |[ ]does[ ]not[ ]exist
            |does[ ]not[ ]exist
            |missing[ ]home[ ]directory
            |not[ ](active|exist|found|known)
            |unknown
            )
        |vdeliver:[ ]invalid[ ]or[ ]unknown[ ]virtual[ ]user
        |your[ ]envelope[ ]recipient[ ]is[ ]in[ ]my[ ]badrcptto[ ]list
        )
    }x;
    return 1 if $argv1 =~ $regex;
    return 0;
}

teststring.txt

@azumakuniyuki
Copy link
Member Author

--- old.pl	2020-09-25 22:27:07.000000000 +0900
+++ new.pl	2020-09-25 23:05:45.000000000 +0900
@@ -22,118 +22,118 @@
     my $class = shift;
     my $argv1 = shift // return undef;
 
-    state $regex = qr{(?>
-         .+[ ]user[ ]unknown
+    state $regex = qr{(
+         [ ]user[ ]unknown
         |[#]5[.]1[.]1[ ]bad[ ]address
-        |[<].+[>][ ]not[ ]found
-        |[<].+[@].+[>][.][.][.][ ]blocked[ ]by[ ]
+        |[<][^>]+[>][ ]not[ ]found
+        |[<][^@]+[@][^>]+[>][.][.][.][ ]blocked[ ]by[ ]
         |5[.]0[.]0[.][ ]mail[ ]rejected[.]
         |5[.]1[.]0[ ]address[ ]rejected[.]
         |adresse[ ]d[ ]au[ ]moins[ ]un[ ]destinataire[ ]invalide.+[a-z]{3}.+(?:416|418)
-        |address[ ](?:does[ ]not[ ]exist|unknown)
+        |address[ ](does[ ]not[ ]exist|unknown)
         |archived[ ]recipient
         |bad[-_ \t]recipient
         |can[']t[ ]accept[ ]user
-        |destination[ ](?:
+        |destination[ ](
              addresses[ ]were[ ]unknown
             |server[ ]rejected[ ]recipients
             )
-        |email[ ]address[ ](?:does[ ]not[ ]exist|could[ ]not[ ]be[ ]found)
-        |invalid[ ](?:
+        |email[ ]address[ ](does[ ]not[ ]exist|could[ ]not[ ]be[ ]found)
+        |invalid[ ](
              address
             |mailbox:
             |mailbox[ ]path|recipient
             )
-        |is[ ]not[ ](?:
+        |is[ ]not[ ](
              a[ ]known[ ]user
             |a[ ]valid[ ]mailbox
             |an[ ]active[ ]address[ ]at[ ]this[ ]host
             )
-        |mailbox[ ](?:
-             .+[ ]does[ ]not[ ]exist
-            |.+[@].+[ ]unavailable
+        |mailbox[ ](
+             [ ]does[ ]not[ ]exist
+            |[^@]+[@][^ ]+[ ]unavailable
             |does[ ]not[ ]exist
             |invalid
-            |is[ ](?:inactive|unavailable)
-            |not[ ](?:present|found)
+            |is[ ](inactive|unavailable)
+            |not[ ](present|found)
             |unavailable
             )
-        |no[ ](?:
-             [ ].+[ ]in[ ]name[ ]directory
+        |no[ ](
+             [ ]in[ ]name[ ]directory
             |account[ ]by[ ]that[ ]name[ ]here
-            |existe[ ](?:dicha[ ]persona|ese[ ]usuario[ ])
+            |existe[ ](dicha[ ]persona|ese[ ]usuario[ ])
             |mail[ ]box[ ]available[ ]for[ ]this[ ]user
-            |mailbox[ ](?:
+            |mailbox[ ](
                  by[ ]that[ ]name[ ]is[ ]currently[ ]available
                 |found
                 )
             |matches[ ]to[ ]nameserver[ ]query
-            |such[ ](?:
+            |such[ ](
                  address[ ]here
                 |mailbox
                 |person[ ]at[ ]this[ ]address
                 |recipient
-                |user(?:[ ]here)?
+                |user([ ]here)?
                 )
             |thank[ ]you[ ]rejected:[ ]account[ ]unavailable:
             |valid[ ]recipients[,][ ]bye    # Microsoft
             )
         |non[- ]?existent[ ]user
-        |not[ ](?:
-             a[ ]valid[ ](?:recipient|user[ ]here)
+        |not[ ](
+             a[ ]valid[ ](recipient|user[ ]here)
             |a[ ]local[ ]address
             |email[ ]addresses
             )
-        |rcpt[ ][<].+[>][ ]does[ ]not[ ]exist
+        |rcpt[ ][<][^>]+[>][ ]does[ ]not[ ]exist
         |recipient[ ]address[ ]rejected[.][ ][(]in[ ]reply[ ]to[ ]rcpt[ ]to[ ]command[)]
-        |rece?ipient[ ](?:
+        |rece?ipient[ ](
              .+[ ]was[ ]not[ ]found[ ]in
-            |address[ ]rejected:[ ](?:
+            |address[ ]rejected:[ ](
                  access[ ]denied
                 |invalid[ ]user
-                |user[ ].+[ ]does[ ]not[ ]exist
-                |user[ ]unknown[ ]in[ ].+[ ]table
+                |user[ ][^ ]+[ ]does[ ]not[ ]exist
+                |user[ ]unknown[ ]in[ ][^ ]+[ ]table
                 |unknown[ ]user
                 )
-            |does[ ]not[ ]exist(?:[ ]on[ ]this[ ]system)?
+            |does[ ]not[ ]exist([ ]on[ ]this[ ]system)?
             |is[ ]not[ ]local
-            |not[ ](?:exist|found|ok)
+            |not[ ](exist|found|ok)
             |unknown
             )
         |requested[ ]action[ ]not[ ]taken:[ ]mailbox[ ]unavailable
-        |resolver[.]adr[.]recip(?:ient)notfound # Microsoft
+        |resolver[.]adr[.]recip(ient)notfound # Microsoft
         |said:[ ]550[-[ ]]5[.]1[.]1[ ].+[ ]user[ ]unknown[ ]
         |smtp[ ]error[ ]from[ ]remote[ ]mail[ ]server[ ]after[ ]end[ ]of[ ]data:[ ]553.+does[ ]not[ ]exist
-        |sorry,[ ](?:
+        |sorry,[ ](
              user[ ]unknown
             |badrcptto
             |no[ ]mailbox[ ]here[ ]by[ ]that[ ]name
             )
-        |the[ ](?:
+        |the[ ](
              email[ ]account[ ]that[ ]you[ ]tried[ ]to[ ]reach[ ]does[ ]not[ ]exist
             |following[ ]recipients[ ]was[ ]undeliverable
             |user[']s[ ]email[ ]name[ ]is[ ]not[ ]found
             )
         |there[ ]is[ ]no[ ]one[ ]at[ ]this[ ]address
-        |this[ ](?:
+        |this[ ](
              address[ ]no[ ]longer[ ]accepts[ ]mail
             |email[ ]address[ ]is[ ]wrong[ ]or[ ]no[ ]longer[ ]valid
             |spectator[ ]does[ ]not[ ]exist
             |user[ ]doesn[']?t[ ]have[ ]a[ ].+[ ]account
             )
-        |unknown[ ](?:
+        |unknown[ ](
              e[-]?mail[ ]address
             |local[- ]part
             |mailbox
             |recipient
             |user
             )
-        |user[ ](?:
-             .+[ ]was[ ]not[ ]found
-            |.+[ ]does[ ]not[ ]exist
+        |user[ ](
+             [ ]was[ ]not[ ]found
+            |[ ]does[ ]not[ ]exist
             |does[ ]not[ ]exist
             |missing[ ]home[ ]directory
-            |not[ ](?:active|exist|found|known)
+            |not[ ](active|exist|found|known)
             |unknown
             )
         |vdeliver:[ ]invalid[ ]or[ ]unknown[ ]virtual[ ]user

@azumakuniyuki azumakuniyuki changed the title Bug in the regular expression at Sisimai::Reason::UserUnknown Slow regex in Sisimai/Reason/UserUnknown.pm Sep 26, 2020
@azumakuniyuki
Copy link
Member Author

azumakuniyuki commented Sep 26, 2020

  • Do not use many .+ in a large regular expression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant