Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link_thumbnailer takes too much time to load #57

Closed
mihir-kumar-thakur opened this issue May 12, 2015 · 18 comments
Closed

link_thumbnailer takes too much time to load #57

mihir-kumar-thakur opened this issue May 12, 2015 · 18 comments
Milestone

Comments

@mihir-kumar-thakur
Copy link

It takes too much time to load and some times gives the timeout error.

@gottfrois
Copy link
Owner

Please give me an URL as an example. Otherwise I won't be able to help you

@mihir-kumar-thakur
Copy link
Author

it is on localhost i am developing a rails app

@mihir-kumar-thakur
Copy link
Author

where is your code base for the demo app ....

@mihir-kumar-thakur
Copy link
Author

here is my code in the view file

                <% @posts.each do |post| %>
                <div class="col-md-4 column_for_the_thumbnail">
                    <div class="thumbnail">
                        <% link1 =  post.post_link %>
                        <% if link1 %>
                <% object = LinkThumbnailer.generate(post.post_link, verify_ssl: false, image_limit: 1, attributes: [:images], description_min_length: 0, image_stats: false).images.first.src.to_s %>
                        <% else %>
                        <% object = '' %>
                        <% end %>
                        <%= image_tag (object),class: "img-thumbnail",id: "image_thumb", style: "width:300px;height:200px", alt: "Image Not Found!" %>
                        <div class="caption">
                            <h5>
                                <%= post.title[0..35]%>...
                            </h5>
                            <span>
                                <%= link_to "Show", post_path(post), class: "btn btn-primary" %>
                            </span>
                        </div>
                    </div>
                </div>
                <% end %>

i am taking @post from the model and generating the thumbnail from the url in @post .

@gottfrois
Copy link
Owner

Sounds great but give me an example of a URL to scrap :)

Here is the demo https://github.com/gottfrois/link_thumbnailer_demo
using the api https://github.com/gottfrois/link_thumbnailer_api

@aruprakshit
Copy link
Contributor

@gottfrois yes, it is little bit slow. I tried this. It took time to load 15.31 seconds.

@aruprakshit
Copy link
Contributor

@MihirKumarThakur Source code is added to the Readme file.

@gottfrois
Copy link
Owner

It seems to be pretty slow due to image stats. You can disable them if you don't care about image size and type but only care about the URLs:

LinkThumbnailer.generate('https://pragprog.com/book/mskanban/real-world-kanban', image_stats: false)

Also the verify_ssl option is true by default, you might want to disable it as well.

@aruprakshit
Copy link
Contributor

@gottfrois Yes, tried with verify_ssl: false, nothing improved. I also tried image_stats: false, and it made the scraping very fast, but the image is different from the one I got, when image_stats was set to true.

@gottfrois
Copy link
Owner

Hum yes it's because the gem is not able to sort the images based on their size anymore. So the images returned are in order of appearance on the page to scrap

@aruprakshit
Copy link
Contributor

@gottfrois I'll try to look into the code. let's see If i can add any value there or not.. :)

@gottfrois
Copy link
Owner

Feel free to do so but unfortunately I don't see a way around this. Image size are gathered over http requests which is what makes it slow. By passing this, it reduce drastically the scraping time but then you can't compare images by size anymore :(

A not easy to implement solution would be to fetch images size in parallel using typhoeus gem for example in order to make concurrent HTTP requests instead of one by one.

@aruprakshit
Copy link
Contributor

@gottfrois +1

@gottfrois
Copy link
Owner

Just to let you know guys, I am working on a solution for this. Stay tune

@gottfrois gottfrois added this to the v3.0.2 milestone Jul 8, 2015
@gottfrois
Copy link
Owner

I replaced FastImage gem by my own version called ImageInfo that allow to fetch images concurrently. My own benchmark shows a page that used to take about ~4.5 second to load now takes less than a second.

The fix is available now in v3.0.2.

Can you guys try in out and let me know if it improve your use cases? Thx

@aruprakshit
Copy link
Contributor

@gottfrois I have updated my Gem. For some links it is hanging, and not giving any result.

Started GET "/posts/link_thumbnailer?url=http%3A%2F%2Fwww.nolo.com%2Flegal-encyclopedia%2Fdivorce-do-you-need-lawyer-29502.html" for 127.0.0.1 at 2015-08-12 12:27:20 +0530
Processing by PostsController#link_thumbnailer as */*
  Parameters: {"url"=>"http://www.nolo.com/legal-encyclopedia/divorce-do-you-need-lawyer-29502.html"}
  Account Load (0.5ms)  SELECT  "accounts".* FROM "accounts" WHERE "accounts"."deleted_at" IS NULL AND "accounts"."id" = $1  ORDER BY "accounts"."id" ASC LIMIT 1  [["id", 2]]
ETHON: Libcurl initialized
ETHON: started MULTI

@gottfrois
Copy link
Owner

Thanks, i will take a look. Can you please create a new issue for this one?

@aruprakshit
Copy link
Contributor

@gottfrois Sure, I will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants