Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set max sessions across subdomains #124

Open
varnit opened this issue Dec 17, 2014 · 13 comments
Open

set max sessions across subdomains #124

varnit opened this issue Dec 17, 2014 · 13 comments

Comments

@varnit
Copy link

varnit commented Dec 17, 2014

Hi,
Is it possible to set the max sessions for a domain and have it work across all its subdomains? For example if I set the following:

ibrowse:set_max_sessions("hotmail.com", 443, 100)

I would want a maximum of 100 connections for hotmail.com and all its subdomains (m.hotmail.com, bay01.hotmaill.com etc)

Is this possible today?

@cmullaparthi
Copy link
Owner

Hi Varnit

Unfortunately no, but is easy enough to do. Will look into it this weekend as I will be refactoring ibrowse a bit to integrate other pull requests.

W: http://chandrusoft.wordpress.com

On 17 Dec 2014, at 23:09, Varnit notifications@github.com wrote:

Hi,
Is it possible to set the max sessions for a domain and have it work across all its subdomains? For example if I set the following:

ibrowse:set_max_sessions("hotmail.com", 443, 100)
I would want a maximum of 100 connections for hotmail.com and all its subdomains (m.hotmail.com, bay01.hotmaill.com etc)

Is this possible today?


Reply to this email directly or view it on GitHub.

@varnit
Copy link
Author

varnit commented Dec 17, 2014

OK, thanks! Let me know if you need help with anything.

@VitoVan
Copy link

VitoVan commented Jun 17, 2016

@cmullaparthi
I assume this has not been done in that weekend?

@cmullaparthi
Copy link
Owner

I'm afraid not :-) I take it this is important for you?

@VitoVan
Copy link

VitoVan commented Jun 17, 2016

@cmullaparthi
Kind of important, forgive my poor English, let me tell a story.

I got a bunch of urls from my boss like this:

http://www.example0.com/foo/bar
http://test.example0.com/foo/bar
http://foo.example0.com/foo/bar
http://bar.example0.com/foo/bar
http://www.example1.com/foo/bar
http://test.example2.com/foo/bar
http://foo.example3.com/foo/bar
http://bar.example1.com/foo/bar
...

Then I got a configuration file from my boss like this:

example0.com --> concurrent: 1
bar.example1.com --> concurrent: 2
bar.example2.com --> concurrent: 3

Then when I request the urls above, I need to limit their concurrency by the configuration above.

And the configuration file, in my boss's opinion:

example0.com ofcouse means *.example0.com and example.com.

And I can't tell my boss that ibrowse does not have that kind of configuration, so I have to handle this in my application.

And the other thing is that, the urls my boss give me, is dynamic changing. So I can't tell my boss:"Give me all your urls, and let me generate a appropriate configuration file for you.", I think my boss will reply:"No, programmer, I won't, I'll add url to the list whenever I want, this is easy, handle it".

So, when the my program has been start running, my boss may come to my desk and give me another url, say:"Add it to the list", then I will do as my boss just said.

For now, here is my solution:

  1. I got a url http://test.example0.com/foo/bar, need to be handled
  2. I got a host from the url test.example0.com
  3. I match the host test.example0.com within the configuration file, use ends_with
  4. I matched example0.com --> concurrent: 1
  5. I call :ibrowse.set_max_sessions("test.example0.com", 80, 1)
  6. I think it's done

Then if I got any url like:

http://test.example0.com/foo/bar1
http://test.example0.com/foo/bar2
http://test.example0.com/foo/bar3
http://test.example0.com/foo/bar4

the steps above will be processed again, cause I am so lazy and I didn't write code to store the configurations and then check if the domain is configurated.

Well, end of story.

I not quite sure if it is the right solution, but it seems working.

BUT: I would love to remove the code I have wrote to match subdomains immediately, if ibrowse have this feature.

@cmullaparthi
Copy link
Owner

I loved this story :-)

There are a couple of complications with this:

  • One or more of your subdomains may be unreachable because there are lots of requests to another subdomain
  • Load balancing will be a more expensive operation because it has to make sure that the limit is enforced while routing requests correctly to each subdomain.

Are you happy with both these limitations? If so I will go ahead and implement it.

@VitoVan
Copy link

VitoVan commented Jun 21, 2016

@cmullaparthi Thanks for your reply ~

One or more of your subdomains may be unreachable because there are lots of requests to another subdomain

  • If the unreachable is caused because of the server bandwidth or capability, then it's fine. Since we limit the max_session on the root domain for a reason.
  • If the unreachable is caused because of the retry_later message from ibrowse, then it is also reasonable, it is exactly what we want.

Load balancing will be a more expensive operation because it has to make sure that the limit is enforced while routing requests correctly to each subdomain.

Expensive is a relative word.

Yesterday I refactored my code for better limitation feature, I use poolboy to set a ibrowse pool for every root domain, every time when I get a url, I check if the pool of the root domain of this url exists, if it exists, use the pool, otherwise create a new pool for this root domain.

If what you are going to implement is not more expensive than my approach, I think it worth a try.

Thank you.

@cmullaparthi
Copy link
Owner

Okay, good. No, the solution will be cheaper than using an external pooling mechanism. I'll create a branch with the proposed changes so you can try.

@VitoVan
Copy link

VitoVan commented Jun 21, 2016

@cmullaparthi Thanks, you are so nice!

@cmullaparthi
Copy link
Owner

I've pushed some changes to the issue_124 branch. See 3fc7e78

Usage:

$ erl -pa ebin
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
1> application:ensure_all_started(ibrowse).
{ok,[ibrowse]}

2>
f(), 
ibrowse:set_max_sessions("google.com", 80, 1), %% Set the LB config for the root domain

Res_1 = ibrowse:send_req("http://www.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]), %% New option

io:format("Res_1: ~p~n", [Res_1]), 

ibrowse:show_dest_status(), 

Res_2 = ibrowse:send_req("http://m.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]),  %% New option

io:format("Res_2: ~p~n", [Res_2]), 

ibrowse:show_dest_status().
  • Result of the first request - succeeds as expected
Res_1: {ok,"302",
           [{"Cache-Control","private"},
            {"Content-Type","text/html; charset=UTF-8"},
            {"Location",
             "http://www.google.co.uk/?gfe_rd=cr&ei=GBZpV-W9IYHS8AeEya-oAg"},
            {"Content-Length","261"},
            {"Date","Tue, 21 Jun 2016 10:25:28 GMT"}],
           "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.co.uk/?gfe_rd=cr&amp;ei=GBZpV-W9IYHS8AeEya-oAg\">here</A>.\r\n</BODY></HTML>\r\n"}
  • Internal ibrowse LB status. 1 connection to www.google.com and the same load balancer PID for all subdomains.
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
  • Result of second request. Fails because we set ''max_sessions'' to 1, and that is taken up by a connection to www.google.com, and this request which is to 'm.google.com' fails
Res_2: {error,retry_later}
  • And the internal LB status. 1 connection to www.google.com and the same load balancer PID for all subdomains.
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
                         m.google.com:80 | 32791 | 0          | <0.41.0>

The same test succeeds if you set max_sessions to 2.

$ erl -pa ebin
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
1> application:ensure_all_started(ibrowse).
{ok,[ibrowse]}
2> 
f(), 
ibrowse:set_max_sessions("google.com", 80, 2), 
Res_1 = ibrowse:send_req("http://www.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]), %% New option

io:format("Res_1: ~p~n", [Res_1]), 

ibrowse:show_dest_status(), 

Res_2 = ibrowse:send_req("http://m.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]),  %% New option

io:format("Res_2: ~p~n", [Res_2]), 

ibrowse:show_dest_status().
Res_1: {ok,"302",
           [{"Cache-Control","private"},
            {"Content-Type","text/html; charset=UTF-8"},
            {"Location",
             "http://www.google.co.uk/?gfe_rd=cr&ei=dBlpV-mXDpPS8AfI1IFY"},
            {"Content-Length","259"},
            {"Date","Tue, 21 Jun 2016 10:39:48 GMT"}],
           "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.co.uk/?gfe_rd=cr&amp;ei=dBlpV-mXDpPS8AfI1IFY\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
Res_2: {ok,"302",
           [{"Location","http://www.google.com/mobile/other/"},
            {"Cache-Control","private"},
            {"Content-Type","text/html; charset=UTF-8"},
            {"X-Content-Type-Options","nosniff"},
            {"Date","Tue, 21 Jun 2016 10:39:48 GMT"},
            {"Server","sffe"},
            {"Content-Length","232"},
            {"X-XSS-Protection","1; mode=block"}],
           "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.com/mobile/other/\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
                         m.google.com:80 | 32791 | 1          | <0.41.0>

VitoVan added a commit to VitoVan/httpotion that referenced this issue Jun 22, 2016
@VitoVan
Copy link

VitoVan commented Jun 22, 2016

@cmullaparthi Awesome! Trying...

@VitoVan
Copy link

VitoVan commented Jun 22, 2016

When I use this feature, it seems... well, a little tricky?

  1. Got a limitation like this: "example.com" -> 2
  2. Received a url like this: http://test.example.com
  3. Got the root domain of http://test.example.com, which is example.com
  4. Send the request, with option
ibrowse:send_req("http://test.example.com", [], get, [], 
                         [{use_subdomain_lb_config, {"example.com", 80}}])

Suddenly I realized something, my boss said:"The server example.com is weak, we won't send more than 2 requests at the same time".

When my boss was saying this, the meaning seems include: "I don't know what the port mean, and I don't care what the 443 or 80 or even 8080 mean, they are just webpages, go get them, less than 2 requests at the same time".

At this time, I think maybe it's better to accomplish these demands in my application, instead of ibrowse, what do you think? @cmullaparthi

@cmullaparthi
Copy link
Owner

Yeah, it's not particularly elegant. But I feel that is the nature of the problem. If you always know that you are going to always shape traffic by using the 1st level subdomain, your code, I suppose, could be simpler using this feature?

invoke_ibrowse(Url, Headers, Payload, Method, Options) ->
    #url{host = Host, port = Port} = ibrowse_lib:parse_url(Url),
    Host_tokens = string:tokens(Host, "."),
    LB_shaping_domain = string:join(lists:nthtail(length(Host_tokens) - 2, Host_tokens, "."),
    ibrowse:send_req(Url, Headers, Method, Payload, [{use_subdomain_lb_config, {LB_shaping_domain, Port}} | Options]).

I suppose the above is more bearable than having to maintain your own pooling mechanism?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@cmullaparthi @varnit @VitoVan and others