Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lookup performance (again) #150

Closed
wants to merge 4 commits into from
Closed

Conversation

corny
Copy link
Contributor

@corny corny commented Jun 13, 2020

regression by c92c566
closes #137

@corny corny force-pushed the performance-fix branch 4 times, most recently from 31ff504 to 994ba6f Compare June 13, 2020 10:58
lib/valid_email2.rb Outdated Show resolved Hide resolved
list = ValidEmail2.disposable_emails
address = ValidEmail2::Address.new("test@example.com")

expect { address.send(:domain_is_in?, list) }.to perform_under(0.0001).sample(10).times
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't assume the performance of the testing system like this. If I run the tests on a potato (or any other system with high load, such as CI systems), this will fail.

It would be better to just test whether the returned value from .whitelist / .blacklist is to be_kind_of Set. You can then assume log access time on these objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the test to do a blackbox test and to check the performance of disposable_domain? directly. There are many things that can be fucked up, because the method wants to do other stuff in the future, like happened in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check to test the kind of the list.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the performance of disposable_domain? directly

By setting a hard limit on 0.1ms this won't get you anywhere because you're measuring execution time. Have a look at perform_logarithmic to check complexity.

@corny corny force-pushed the performance-fix branch 2 times, most recently from bd6545d to 6d7b91c Compare June 13, 2020 11:13
@corny corny force-pushed the performance-fix branch from 4954cc7 to a72e4cf Compare June 13, 2020 11:27
@corny corny force-pushed the performance-fix branch from a72e4cf to 8d3b362 Compare June 13, 2020 11:29
Copy link
Owner

@micke micke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm truly sorry for this regression and i really appreciate your contributions!
I have some suggestions, and i would also suggest you are a bit kinder when working in open-source projects :)

Comment on lines -44 to -45
# Address may not contain a dot directly before @
address.address !~ /\.@/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your reason for removing this check? It was introduced in #136

@valid ||= begin
return false if @parse_error
return @valid if @valid != nil
return false if @parse_error
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false if @parse_error
return false if @parse_error

Comment on lines 35 to 43
# Domain needs to have at least one dot
domain =~ /\./ &&
domain.include?('.') &&
# Domain may not have two consecutive dots
domain !~ /\.{2,}/ &&
!domain.include?('..') &&
# Domain may not start with a dot
domain !~ /^\./ &&
!domain.start_with?('.') &&
# Domain may not start with a dash
domain !~ /^-/ &&
!domain.start_with?('-') &&
# Domain name may not end with a dash
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, but what do you think about removing these comments? They become a bit superfluous.

@@ -24,25 +24,24 @@ def initialize(address)
end

def valid?
@valid ||= begin
return false if @parse_error
return @valid if @valid != nil
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return @valid if @valid != nil
return @valid unless @valid.nil?

@whitelist ||= load_if_exists(WHITELIST_FILE)
end

private
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private does not make class methods private in this context. It needs to be in a class << self or you can use private_class_method

Comment on lines +34 to +39
result = Set.new
File.open(path, "r").each_line do |line|
line.strip!
result << line if line.present?
end
result
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = Set.new
File.open(path, "r").each_line do |line|
line.strip!
result << line if line.present?
end
result
File.open(path, "r").each_line.each_with_object(Set.new) do |domain, set|
set << domain.tap(&:chomp!)
end
Calculating -------------------------------------
               corny     4.858M memsize (     0.000  retained)
                        67.367k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
               micke     4.859M memsize (     0.000  retained)
                        67.367k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
              user     system      total        real
corny:   23.706184   0.328159  24.034343 ( 24.114417)
micke:   16.692115   0.341454  17.033569 ( 17.130394)

require "spec_helper"

describe "Performance testing" do

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

# check lookup timing
expect { address.disposable_domain? }.to perform_under(0.0001).sample(10).times
end

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


let(:disposable_domain) { ValidEmail2.disposable_emails.first }

it "lookup timing" do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it "lookup timing" do
it "has acceptable lookup performance" do

Comment on lines +29 to +30
# This method MUST return a Set, otherwise the
# performance will be as bad as hell!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This method MUST return a Set, otherwise the
# performance will be as bad as hell!

This is clear because of the specs.

@micke micke mentioned this pull request Aug 10, 2020
@micke micke closed this in #161 Aug 10, 2020
@micke
Copy link
Owner

micke commented Aug 10, 2020

Thanks @corny and @dmke for this PR, since i didn't hear from you in a while and i want to fix this performance issue i "hijacked" your PR and fixed the last remaining issues. Hope you don't mind!

@micke
Copy link
Owner

micke commented Aug 10, 2020

I just pushed version 3.3.1 to rubygems with this regression fixed!

@corny
Copy link
Contributor Author

corny commented Aug 10, 2020

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants