Skip to content
This repository has been archived by the owner on Sep 29, 2023. It is now read-only.

acquire_token not thread-safe #123

Closed
ali-cetin-4ss opened this issue Feb 28, 2018 · 12 comments · Fixed by #128
Closed

acquire_token not thread-safe #123

ali-cetin-4ss opened this issue Feb 28, 2018 · 12 comments · Fixed by #128

Comments

@ali-cetin-4ss
Copy link

The acquire_token method is not thread-safe.

Example:

context = adal.AuthenticationContext(authority, api_version=None)

context.acquire_token_with_username_password(resource, username, password, client_id)

# Then executing the method below in parallell when the token cache expires, say with concurrent.futures.ThreadPoolExecutor will cause a KeyError. 

context.acquire_token(resource, username, client_id)

# The issue seems to be a dict.pop operation during cache update which itself is not thread-safe.
@rayluo
Copy link
Collaborator

rayluo commented Feb 28, 2018

Thanks for the report. We will take a look into this.

@rayluo
Copy link
Collaborator

rayluo commented Mar 2, 2018

Hi @ace-e4s , when triaging issues, we would normally want to have the error/exception trace to debug. Since we don't have them this time, we manage to come up with a tentative fix based on limited information but won't know for sure it would address your scenario.

You can grab it by pip install git+https://github.com/AzureAD/azure-activedirectory-library-for-python.git@threadsafe-cache-pop and then test. Please do report back whether it eliminate those (rare?) errors. Otherwise that tentative fix may not be merged into our future release.

@ali-cetin-4ss
Copy link
Author

I'll try to do some testing later today and let you know.

@rayluo
Copy link
Collaborator

rayluo commented Mar 5, 2018

@ace-e4s thanks for your prompt response and willing to help! We are planning a new release soon, so we will like to have this fix ready before that.

@ali-cetin-4ss
Copy link
Author

Btw, I did some testing today, attempting to isolate and consistently reproducing the issue. I will do some more tomorrow and return to you.

@rayluo
Copy link
Collaborator

rayluo commented Mar 7, 2018

@ace-e4s FYI: there is an upcoming release 0.5.1 at PR 127, which does NOT include this pending fix (yet?).

@ali-cetin-4ss
Copy link
Author

ali-cetin-4ss commented Mar 7, 2018

Hi,
I had problems with recreating the issue, but I finally made it :)

I'm running v0.5.0 of adal, btw.

Case 1: our own code that uses adal run in parallell with memory_profiler on:

Traceback (most recent call last):
  File "C:\Anaconda2\envs\adal_test\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Anaconda2\envs\adal_test\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 1228, in <module>
    exec_with_profiler(script_filename, prof, args.backend, script_args)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 1123, in exec_with_profiler
    execfile(filename, ns, ns)
  File "gla_upload_adal.py", line 75, in <module>
    for series_tag, series_id in response:
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\concurrent\futures\_base.py", line 641, in result_iterator
    yield fs.pop().result()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\concurrent\futures\_base.py", line 455, in result
    return self.__get_result()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\concurrent\futures\thread.py", line 63, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 1072, in wrapper
    val = prof(func)(*args, **kwargs)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 659, in f
    return func(*args, **kwds)
  File "gla_upload_adal.py", line 38, in call_reservoir
    response = client.create(series)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\client.py", line 103, in create
    file_id = self._storage.put(series)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\storage\storage.py", line 74, in put
    upload_params = self._files_api.upload(self.token)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\storage\storage.py", line 58, in token
    return self._authenticator.token
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\authenticate.py", line 79, in token
    return self._token()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\authenticate.py", line 83, in _token
    self.params.resource, self.username, self.params.client_id)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\authentication_context.py", line 131, in acquire_token
    return self._acquire_token(token_func)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\authentication_context.py", line 114, in _acquire_token
    return token_func(self)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\authentication_context.py", line 129, in token_func
    return token_request.get_token_from_cache_with_refresh(user_id)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\token_request.py", line 349, in get_token_from_cache_with_refresh
    return self._find_token_from_cache()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\token_request.py", line 128, in _find_token_from_cache
    return self._cache_driver.find(cache_query)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 199, in find
    is_resource_tenant_specific)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 177, in _refresh_entry_if_necessary
    return self._refresh_expired_entry(entry)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 155, in _refresh_expired_entry
    self._replace_entry(entry, new_entry)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 149, in _replace_entry
    self.remove(entry_to_replace)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 205, in remove
    self._cache.remove([entry])
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\token_cache.py", line 84, in remove
    self._cache.pop(key)
KeyError: <adal.token_cache.TokenCacheKey object at 0x0000000025482A20>

@rayluo
Copy link
Collaborator

rayluo commented Mar 7, 2018

@ace-e4s Thanks! Good to see the trace! So the tentative fix that I mentioned 5 days ago was indeed targetting at the same line.

By saying "I'm running v0.5.0", I assume you mean the official v0.5.0 (installed by pip install adal), don't you? Now would you mind to test out the tentative fix by grabbing it by pip install git+https://github.com/AzureAD/azure-activedirectory-library-for-python.git@threadsafe-cache-pop (which still has a version number 0.5.0 though)?

@ali-cetin-4ss
Copy link
Author

ali-cetin-4ss commented Mar 8, 2018

@rayluo : After some extensive testing, it appears that the issue is related to use of memory_profiler. I'm not able to recreate the issue without the memory profiling turned on. How and why that can be the case, I'm not sure.

Based on that, It seems like your code was ok all along. I see no reason to merge the proposed branch into trunk.

Anyway, thanks for your quick response :) Cheers 🥇

@rayluo
Copy link
Collaborator

rayluo commented Mar 8, 2018

Here comes even quicker response :)

@ace-e4s : Thanks for your new information, which actually explains. While I have not personally used memory_profiler before, I believe that in general, all profilers introduce overhead. So the potential race conditions would likely become more obvious under heavy load.

If you don't mind, I would suggest you to grab the aforementioned tentative fix, and then rerun your test under SAME memory_profiler situation, and see if the error would be gone. If yes, it might still be worth to merge it. What do you think?

@ali-cetin-4ss
Copy link
Author

ali-cetin-4ss commented Mar 9, 2018

I have now tested the fix, and it seems to remedy the situation as far as I can see.

That said, the fix adresses the symptom. The fix "acknowledges" that the lock you have put may have been lifted for some reason, and then tries to remedy the situation by returning a default None to the pop.

Furthermore, a side effect with the fix may be that it also mask potential exception/issues you would like to bubble up to the surface.

What I don't understand is why the lock appears to be lifted in the first place?! It doesn't make any sense. Maybe this is just an undesired side effect (bug?) of the memory_profiler?

The call is yours at the end of the day. What do you think?

@rayluo
Copy link
Collaborator

rayluo commented Mar 9, 2018

Here is the reason. The lock is inside the lower level cache.remove(), that lock has NOT been "lifted". However, it is the higher level _refresh_entry_if_necessary() uses this pattern "search cached token; oh it is stale, let's remove it", and that search-and-then-delete logic is not within a lock. I do have something in mind and plan to rewrite this cache logic completely in the future, but for now, the proposing patch is good enough. And nope, it won't mask any other exceptions.

I'll include that patch into the upcoming release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants