-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove race condition in credentials.py #643
Conversation
@martindurant here is a branch with your proposed fix. The tests here pass, but I do see weird behavior when running an actual workload. I double checked and the failure appears by simply bumping the version of gcsfs to this branch.
The call comes from |
I applied the fix in my run (writing hundreds of Parquet files with ~1b rows in total), and it works 🙂 Before the job was hanging consistently after 1h. |
This causes the workflow to fail? Are you running multiple threads too? Missing credentials should not cause FileNotFound but PermissionError, and the traceback indicates this is coming from the directory listing cache - doing reading? |
Yes this causes a failure. We're not running with multiple threads, it's the first (reading) call in the process, which uses Edit: I am disabling caching and double checking, might be an unrelated issue on my side where i don't invalidate caches properly. It's just weird it didn't appear before. |
So I understand clearly: |
@martindurant I did notice that pre-patch I accidentally ran So, as to this bugfix, I think we're good, opening the PR for review. The following is therefore unrelated to the PR, but just to answer your question @martindurant :
What I do exactly (in pseudocode) is:
So, if the cache persists somehow between instantiations of |
If this is from a different instance/process/machine, then it is to be expected that the directory cache of the instance above has become stale, so |
Thats very useful information @martindurant, thank you very much! I was trying to find something along these lines in the documentation, but failed to. If this is not in there yet, would it make sense adding it? |
It is an inherited behaviour from fsspec, mentioned here: https://filesystem-spec.readthedocs.io/en/latest/features.html#instance-caching I totally agree that the whole project and set of repos could do with a docs reorganise, but I think it's beyond my personal effort level right now. |
See fsspec/filesystem_spec#565 (comment)