acquire_token not thread-safe #123

ali-cetin-4ss · 2018-02-28T12:25:48Z

The acquire_token method is not thread-safe.

Example:

context = adal.AuthenticationContext(authority, api_version=None)

context.acquire_token_with_username_password(resource, username, password, client_id)

# Then executing the method below in parallell when the token cache expires, say with concurrent.futures.ThreadPoolExecutor will cause a KeyError. 

context.acquire_token(resource, username, client_id)

# The issue seems to be a dict.pop operation during cache update which itself is not thread-safe.

The text was updated successfully, but these errors were encountered:

rayluo · 2018-02-28T18:16:22Z

Thanks for the report. We will take a look into this.

rayluo · 2018-03-02T23:12:06Z

Hi @ace-e4s , when triaging issues, we would normally want to have the error/exception trace to debug. Since we don't have them this time, we manage to come up with a tentative fix based on limited information but won't know for sure it would address your scenario.

You can grab it by pip install git+https://github.com/AzureAD/azure-activedirectory-library-for-python.git@threadsafe-cache-pop and then test. Please do report back whether it eliminate those (rare?) errors. Otherwise that tentative fix may not be merged into our future release.

ali-cetin-4ss · 2018-03-05T07:43:43Z

I'll try to do some testing later today and let you know.

rayluo · 2018-03-05T19:21:36Z

@ace-e4s thanks for your prompt response and willing to help! We are planning a new release soon, so we will like to have this fix ready before that.

ali-cetin-4ss · 2018-03-05T20:45:39Z

Btw, I did some testing today, attempting to isolate and consistently reproducing the issue. I will do some more tomorrow and return to you.

rayluo · 2018-03-07T00:22:54Z

@ace-e4s FYI: there is an upcoming release 0.5.1 at PR 127, which does NOT include this pending fix (yet?).

ali-cetin-4ss · 2018-03-07T07:51:32Z

Hi,
I had problems with recreating the issue, but I finally made it :)

I'm running v0.5.0 of adal, btw.

Case 1: our own code that uses adal run in parallell with memory_profiler on:

Traceback (most recent call last):
  File "C:\Anaconda2\envs\adal_test\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Anaconda2\envs\adal_test\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 1228, in <module>
    exec_with_profiler(script_filename, prof, args.backend, script_args)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 1123, in exec_with_profiler
    execfile(filename, ns, ns)
  File "gla_upload_adal.py", line 75, in <module>
    for series_tag, series_id in response:
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\concurrent\futures\_base.py", line 641, in result_iterator
    yield fs.pop().result()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\concurrent\futures\_base.py", line 455, in result
    return self.__get_result()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\concurrent\futures\thread.py", line 63, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 1072, in wrapper
    val = prof(func)(*args, **kwargs)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\memory_profiler.py", line 659, in f
    return func(*args, **kwds)
  File "gla_upload_adal.py", line 38, in call_reservoir
    response = client.create(series)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\client.py", line 103, in create
    file_id = self._storage.put(series)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\storage\storage.py", line 74, in put
    upload_params = self._files_api.upload(self.token)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\storage\storage.py", line 58, in token
    return self._authenticator.token
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\authenticate.py", line 79, in token
    return self._token()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\datareservoirio\authenticate.py", line 83, in _token
    self.params.resource, self.username, self.params.client_id)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\authentication_context.py", line 131, in acquire_token
    return self._acquire_token(token_func)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\authentication_context.py", line 114, in _acquire_token
    return token_func(self)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\authentication_context.py", line 129, in token_func
    return token_request.get_token_from_cache_with_refresh(user_id)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\token_request.py", line 349, in get_token_from_cache_with_refresh
    return self._find_token_from_cache()
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\token_request.py", line 128, in _find_token_from_cache
    return self._cache_driver.find(cache_query)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 199, in find
    is_resource_tenant_specific)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 177, in _refresh_entry_if_necessary
    return self._refresh_expired_entry(entry)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 155, in _refresh_expired_entry
    self._replace_entry(entry, new_entry)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 149, in _replace_entry
    self.remove(entry_to_replace)
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\cache_driver.py", line 205, in remove
    self._cache.remove([entry])
  File "C:\Anaconda2\envs\adal_test\lib\site-packages\adal\token_cache.py", line 84, in remove
    self._cache.pop(key)
KeyError: <adal.token_cache.TokenCacheKey object at 0x0000000025482A20>

rayluo · 2018-03-07T18:18:06Z

@ace-e4s Thanks! Good to see the trace! So the tentative fix that I mentioned 5 days ago was indeed targetting at the same line.

By saying "I'm running v0.5.0", I assume you mean the official v0.5.0 (installed by pip install adal), don't you? Now would you mind to test out the tentative fix by grabbing it by pip install git+https://github.com/AzureAD/azure-activedirectory-library-for-python.git@threadsafe-cache-pop (which still has a version number 0.5.0 though)?

ali-cetin-4ss · 2018-03-08T07:40:48Z

@rayluo : After some extensive testing, it appears that the issue is related to use of memory_profiler. I'm not able to recreate the issue without the memory profiling turned on. How and why that can be the case, I'm not sure.

Based on that, It seems like your code was ok all along. I see no reason to merge the proposed branch into trunk.

Anyway, thanks for your quick response :) Cheers 🥇

rayluo · 2018-03-08T09:06:00Z

Here comes even quicker response :)

@ace-e4s : Thanks for your new information, which actually explains. While I have not personally used memory_profiler before, I believe that in general, all profilers introduce overhead. So the potential race conditions would likely become more obvious under heavy load.

If you don't mind, I would suggest you to grab the aforementioned tentative fix, and then rerun your test under SAME memory_profiler situation, and see if the error would be gone. If yes, it might still be worth to merge it. What do you think?

ali-cetin-4ss · 2018-03-09T09:35:06Z

I have now tested the fix, and it seems to remedy the situation as far as I can see.

That said, the fix adresses the symptom. The fix "acknowledges" that the lock you have put may have been lifted for some reason, and then tries to remedy the situation by returning a default None to the pop.

Furthermore, a side effect with the fix may be that it also mask potential exception/issues you would like to bubble up to the surface.

What I don't understand is why the lock appears to be lifted in the first place?! It doesn't make any sense. Maybe this is just an undesired side effect (bug?) of the memory_profiler?

The call is yours at the end of the day. What do you think?

rayluo · 2018-03-09T17:47:55Z

Here is the reason. The lock is inside the lower level cache.remove(), that lock has NOT been "lifted". However, it is the higher level _refresh_entry_if_necessary() uses this pattern "search cached token; oh it is stale, let's remove it", and that search-and-then-delete logic is not within a lock. I do have something in mind and plan to rewrite this cache logic completely in the future, but for now, the proposing patch is good enough. And nope, it won't mask any other exceptions.

I'll include that patch into the upcoming release.

rayluo added the Need info label Mar 2, 2018

rayluo mentioned this issue Mar 9, 2018

Fix for thread racing issue in cache.remove() #128

Merged

rayluo added enhancement and removed Need info labels Mar 12, 2018

rayluo closed this as completed in #128 Mar 12, 2018

rayluo mentioned this issue Mar 12, 2018

Release 0.5.1 #127

Merged

rayluo mentioned this issue Dec 6, 2018

Persistent storage of token #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acquire_token not thread-safe #123

acquire_token not thread-safe #123

ali-cetin-4ss commented Feb 28, 2018

rayluo commented Feb 28, 2018

rayluo commented Mar 2, 2018 •

edited

Loading

ali-cetin-4ss commented Mar 5, 2018

rayluo commented Mar 5, 2018

ali-cetin-4ss commented Mar 5, 2018

rayluo commented Mar 7, 2018 •

edited

Loading

ali-cetin-4ss commented Mar 7, 2018 •

edited

Loading

rayluo commented Mar 7, 2018

ali-cetin-4ss commented Mar 8, 2018 •

edited

Loading

rayluo commented Mar 8, 2018 •

edited

Loading

ali-cetin-4ss commented Mar 9, 2018 •

edited

Loading

rayluo commented Mar 9, 2018

acquire_token not thread-safe #123

acquire_token not thread-safe #123

Comments

ali-cetin-4ss commented Feb 28, 2018

rayluo commented Feb 28, 2018

rayluo commented Mar 2, 2018 • edited Loading

ali-cetin-4ss commented Mar 5, 2018

rayluo commented Mar 5, 2018

ali-cetin-4ss commented Mar 5, 2018

rayluo commented Mar 7, 2018 • edited Loading

ali-cetin-4ss commented Mar 7, 2018 • edited Loading

rayluo commented Mar 7, 2018

ali-cetin-4ss commented Mar 8, 2018 • edited Loading

rayluo commented Mar 8, 2018 • edited Loading

ali-cetin-4ss commented Mar 9, 2018 • edited Loading

rayluo commented Mar 9, 2018

rayluo commented Mar 2, 2018 •

edited

Loading

rayluo commented Mar 7, 2018 •

edited

Loading

ali-cetin-4ss commented Mar 7, 2018 •

edited

Loading

ali-cetin-4ss commented Mar 8, 2018 •

edited

Loading

rayluo commented Mar 8, 2018 •

edited

Loading

ali-cetin-4ss commented Mar 9, 2018 •

edited

Loading