Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[leo_gateway] Reduce network traffic between leo_gateway and leo_storage #325

Closed
mocchira opened this issue Mar 6, 2015 · 9 comments
Closed

Comments

@mocchira
Copy link
Member

mocchira commented Mar 6, 2015

Problem

When

  • A cache is EXPIRED
  • Its size is very large
  • There are lots of concurrent requests trying to retrieve the EXPIRED object

What happen

  • Massive amount of rpc between leo_gateway and leo_strage will happen then result in causing lots of timeouts

Solution

There are some rooms to reduce network traffic between leo_gateway and leo_storage(s) by tracking ongoing requests.

For example,
If there is a request already trying to retrieve the same key and the object size is larger than a specified threshold and the built-in cache mechanism is enabled then
waiting for the request and after finishing the request,
go forward with the cache on leo_gateway.

@mocchira
Copy link
Member Author

mocchira commented Sep 2, 2015

I take over this issue and will implement based on https://github.com/leo-project/leo_gateway/pull/20/files and https://github.com/leo-project/leo_tran

mocchira added a commit to leo-project/leo_gateway that referenced this issue Sep 4, 2015
mocchira added a commit that referenced this issue Sep 4, 2015
mocchira added a commit to leo-project/leo_storage that referenced this issue Sep 8, 2015
@mocchira mocchira closed this as completed Sep 8, 2015
@windkit
Copy link
Contributor

windkit commented Nov 10, 2015

@mocchira This fix seems blocking others until the whole object is sent to the client. This also causes part of the problem with Issue #429

With code
leo_large_object_get_handler:handle_call({get... -> leo_tran:run -> leo_tran_serializable_cntnr:run -> gen_server:call

Indeed, It would guarantee only one 'thread' would retrieve the object from leo_storage. However, this imposes a big performance hit to threads waiting for that. I think a better way to do so is to allow those waiting thread to retrieve from disk cache whenever there is data available. And that's what I want to achieve in the PR.

@mocchira
Copy link
Member Author

I think a better way to do so is to allow those waiting thread to retrieve from disk cache whenever
there is data available. And that's what I want to achieve in the PR.

@windkit it's currently working as you described above.

@windkit
Copy link
Contributor

windkit commented Nov 10, 2015

@mocchira Oh, I misunderstood the flow, sorry for that (miss the gen_server:cast)

I got this error with my testing though, it seems to happen when no disk cache is configured?

[E] gateway_0@127.0.0.1 2015-11-10 09:46:33.617538 +0900    1447116393  null:null   0   gen_server <0.1874.0> terminated with reason: bad argument in call     to erlang:phash2(<<"test/testfile">>, 0) in leo_cache_api:put_begin_tran/2 line 254^M
[E] gateway_0@127.0.0.1 2015-11-10 09:46:33.617842 +0900    1447116393  null:null   0   ["CRASH REPORT ",[80,114,111,99,101,115,115,32,"<0.1874.0>",32,119    ,105,116,104,32,"1",32,110,101,105,103,104,98,111,117,114,115,32,"exited",32,119,105,116,104,32,114,101,97,115,111,110,58,32,[["bad argument in call to ",    ["erlang",58,"phash2",40,["<<","\"test/testfile\"",">>"],44,32,"0",41]," in ",[["leo_cache_api",58,"put_begin_tran",47,"2"],[32,108,105,110,101,32,"254"]]    ]," in ",[["gen_server",58,"terminate",47,"7"],[32,108,105,110,101,32,"804"]]]]]^M
[E] gateway_0@127.0.0.1 2015-11-10 09:46:33.618125 +0900    1447116393  null:null   0   Ranch listener leo_gateway_s3_api had connection process started w    ith cowboy_protocol:start_link/4 at <0.1872.0> exit with reason: {badarg,[{erlang,phash2,[<<"test/testfile">>,0],[]},{leo_cache_api,put_begin_tran,2,[{fil    e,"src/leo_cache_api.erl"},{line,254}]},{leo_large_object_get_handler,put_begin_tran_with_retry,1,[{file,"src/leo_large_object_get_handler.erl"},{line,288    }]},{leo_large_object_get_handler,handle_call,3,[{file,"src/leo_large_object_get_handler.erl"},{line,123}]},{gen_server,try_handle_call,4,[{file,"gen_serv    er.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}

@mocchira
Copy link
Member Author

@windkit
https://github.com/leo-project/leo_cache/blob/develop/include/leo_cache.hrl#L145

This error has happened at the above line.
It seems that the value of cache_workers is no set properly.

when no disk cache is configured

might be.
Please check the cache configurations.

@windkit
Copy link
Contributor

windkit commented Nov 10, 2015

@mocchira Yep, I have tested with disk cache configured and it works fine and as expected :) Client can read the partial disk cache.

But should we fix this as user may not use disk cache?

@mocchira
Copy link
Member Author

But should we fix this as user may not use disk cache?

@windkit yep. please file as another issue.

@windkit
Copy link
Contributor

windkit commented Nov 10, 2015

Filed as #433

@mocchira
Copy link
Member Author

@windkit thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants