Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase timeout on HTTP requests and use a variable #409

Closed
paddyroddy opened this issue Apr 4, 2024 · 14 comments · Fixed by #418
Closed

Increase timeout on HTTP requests and use a variable #409

paddyroddy opened this issue Apr 4, 2024 · 14 comments · Fixed by #418
Labels
enhancement Idea or request for a new feature

Comments

@paddyroddy
Copy link

Prior to the zenodo upgrade last year I had no issues with downloading my zenodo dataset. It has since been flaky and I sometimes get an

error like this

Traceback (most recent call last):
  File "/home/runner/work/sleplet/sleplet/examples/arbitrary/africa/slepian_reconstruction_africa.py", line 29, in <module>
    main()
  File "/home/runner/work/sleplet/sleplet/examples/arbitrary/africa/slepian_reconstruction_africa.py", line 11, in main
    slepian = sleplet.slepian_methods.choose_slepian_method(L, region)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/sleplet/sleplet/src/sleplet/slepian_methods.py", line 62, in choose_slepian_method
    return sleplet.slepian.slepian_arbitrary.SlepianArbitrary(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 135, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/home/runner/work/sleplet/sleplet/src/sleplet/slepian/slepian_arbitrary.py", line 46, in __post_init__
    super().__post_init__()
  File "/home/runner/work/sleplet/sleplet/src/sleplet/slepian/slepian_functions.py", line 55, in __post_init__
    self.mask = self._create_mask()
                ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/sleplet/sleplet/src/sleplet/slepian/slepian_arbitrary.py", line 55, in _create_mask
    return sleplet._mask_methods.create_mask_region(self._resolution, self.region)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/sleplet/sleplet/src/sleplet/_mask_methods.py", line 46, in create_mask_region
    mask = _load_mask(L, name)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/sleplet/sleplet/src/sleplet/_mask_methods.py", line 72, in _load_mask
    mask = sleplet._data.setup_pooch.find_on_pooch_then_local(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/sleplet/sleplet/src/sleplet/_data/setup_pooch.py", line 49, in wrapper
    _POOCH.load_registry_from_doi()
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/pooch/core.py", line 704, in load_registry_from_doi
    return repository.populate_registry(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/pooch/downloaders.py", line 901, in populate_registry
    for filedata in self.api_response["files"]:
                    ^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/pooch/downloaders.py", line 801, in api_response
    self._api_response = requests.get(
                         ^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='zenodo.org', port=443): Read timed out. (read timeout=5)

The relevant code is something like this

_POOCH = None
_ZENODO_DATA_DOI = "10.5281/zenodo.7767698"
if _POOCH is None:
    _POOCH = pooch.create(
        path=pooch.os_cache("sleplet"),
        base_url=f"doi:{_ZENODO_DATA_DOI}/",
        registry=None,
    )
    _POOCH.load_registry_from_doi()

Is it possible to increase the timeout of load_registry_from_doi?

@leouieda
Copy link
Member

Hi @paddyroddy thanks for reporting this! We've had timeouts on HTTP requests hardcoded from #329 but I think the value we used was too strict (5s). It would be good to change all of these to use a package variable and set it to a larger number (30-60s maybe).

I'm updating the title of this issue to match. I won't have the time to do this soon but if anyone else is interested, then please be my guest 🙂

@leouieda leouieda changed the title Is it possible to increase the timeout of load_registry_from_doi? Increase timeout on HTTP requests and use a variable Apr 15, 2024
@leouieda leouieda added the enhancement Idea or request for a new feature label Apr 15, 2024
@dokempf
Copy link
Contributor

dokempf commented Apr 16, 2024

Another idea for implementation of this one would be to not make the timeout parameter configurable, but instead make the entire requests.Session parameter configurable. It is the one that controls the timeout, but also many more. This could save us work on a lot of future feature requests that relate to how HTTP requests are executed.

@paddyroddy
Copy link
Author

paddyroddy commented Apr 24, 2024

@leouieda can you expand on what you mean by a "package variable"? I'm annoyed with my tests constantly failing, so pretty keen to get this fixed

@paddyroddy
Copy link
Author

I've had a look into the requests.Session approach, but it's kind of tricky with how downloaders.py is currently written. My thought was you could replace **kwargs with a session object. But then requests are made elsewhere in doi_to_url, ZenodoRepository.api_response, FigshareRepository.api_response, DataverseRepository._get_api_response - where a separate requests.Session object would have to be passed. I think instead I will try to set a global timeout variable for the package.

@paddyroddy
Copy link
Author

I'm still unsure about a "package variable". I've noticed several functions/classes have a **kwargs option to pass to requests, which would include timeout (or at least you'd expect that). So would it make much sense to have **kwargs as well as timeout?

For now, I'm going to try the retry_if_failed option to pooch.create and hope that that works well enough for me.

paddyroddy added a commit to astro-informatics/sleplet that referenced this issue Jun 3, 2024
@paddyroddy
Copy link
Author

retry_if_failed still seems to fail with the timeout error. Any help on this issue would be great.

@leouieda
Copy link
Member

leouieda commented Jun 5, 2024

Hi @paddyroddy sorry for the delay. I meant having a global variable in a module somewhere that is used throughout the package. We can set this to something relatively high to help with flaky connections. But being a global also means it can be patched in an emergency.

@leouieda
Copy link
Member

leouieda commented Jun 5, 2024

I agree that refactoring how we pass the Sessions will be a much larger problem. Ideally, we'd have foreseen this issue when we first wrote all that code. But doing this now doesn't seem like a good solution. I'd rather wait until there is a good use case for configuring the entire Session to motivate putting in the work.

@paddyroddy
Copy link
Author

I agree that refactoring how we pass the Sessions will be a much larger problem. Ideally, we'd have foreseen this issue when we first wrote all that code. But doing this now doesn't seem like a good solution. I'd rather wait until there is a good use case for configuring the entire Session to motivate putting in the work.

Agreed. I think Session seems like a nice way to go in future, but non-trivial to implement without breaking current stuff. Think this compromise works well, thanks!

@leouieda
Copy link
Member

leouieda commented Jun 6, 2024

@paddyroddy v1.8.2 is now on PyPI and should be on conda-forge by tomorrow at the latest. Hope this helps!

@paddyroddy
Copy link
Author

@leouieda I've only done a few tests runs of my CI but looks like it's done the trick! Thanks 😄

@leouieda
Copy link
Member

leouieda commented Jun 7, 2024

@paddyroddy great to know! Feel free to open another issue or reopen this one if it comes back.

@paddyroddy
Copy link
Author

Does seem like I occasionally hit that timeout, damn Zenodo - it didn't use to be a problem

@leouieda
Copy link
Member

😢 yeah, they've been a bit slow recently. If it's sample data for a package, I tend to mirror it on GitHub and make CI download from there instead to avoid overloading them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants