Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant timeouts after multiple calls with async #769

Closed
1 task done
Inkorak opened this issue Nov 10, 2023 · 68 comments · Fixed by Datura-ai/cortex.t#13
Closed
1 task done

Constant timeouts after multiple calls with async #769

Inkorak opened this issue Nov 10, 2023 · 68 comments · Fixed by Datura-ai/cortex.t#13
Labels
bug Something isn't working

Comments

@Inkorak
Copy link

Inkorak commented Nov 10, 2023

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Constant timeouts after multiple asynchronous calls. It was discovered when using the Llama_Index framework that when calls are made to this library through the openai-python client wrapped with async, constant timeouts begin. If you do this without async or with asynchrony, but on the old version like 0.28, then there are no problems.

To Reproduce

Several calls in a row, for example, to embeddings that are wrapped with asynс.

Code snippets

No response

OS

ubuntu

Python version

Python 3.11.4

Library version

v1.2.0 and newer

@Inkorak Inkorak added the bug Something isn't working label Nov 10, 2023
@RobertCraigie
Copy link
Collaborator

Hi @Inkorak, I can't reproduce the issue you're seeing. Can you share a code snippet?

This snippet passes for me:

import anyio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def main() -> None:
    for _ in range(10):
        await client.embeddings.create(input="Hello world!", model="text-embedding-ada-002")

anyio.run(main)

@ashwinsr
Copy link

I can confirm this issue is affecting us as well. We recently upgrade from 0.28 to 1.2.3 and 12 hours later the timeouts began.

@RobertCraigie
Copy link
Collaborator

@ashwinsr can you share any more details?

Is this only happening when the client has been in use for a prolonged period of time?

@ashwinsr
Copy link

ashwinsr commented Nov 11, 2023

I'm trying really hard to build a minimal failing example, but I haven't gotten one yet. Basically we have a FastAPI server that uses the Async OpenAI client with streaming responses. After a while of running, the vast majority of calls to await client.chat.completions.create will give us timeouts.

We are currently on 1.2.3.

Any suggestions on what we can do to troubleshoot this / help you fix? This is a P0 for us right now.

@RobertCraigie
Copy link
Collaborator

RobertCraigie commented Nov 11, 2023

Are you seeing connection pool timeouts or is it a request timeout?

@ashwinsr
Copy link

We are seeing pool timeouts and some request timeouts. Give me a second and I'll pull some more specific logs for you.

@GCODIN

This comment has been minimized.

@RobertCraigie
Copy link
Collaborator

Okay, there was a bug reported recently with streaming responses not being closed correctly. But I did manage to reproduce that and push a fix so I'm surprised you're still seeing connection pool timeouts: #763

Do you have a lot of concurrent requests happening at once?

@ashwinsr
Copy link

@RobertCraigie We are seeing

  1. Some httpcore.PoolTimeout's
  2. Some regular timeouts (although this is possibly FastAPI timing out on waiting on OpenAI, but apologies something just logged the timeout error, we will improve our logging herre)

Thoughts?

@ashwinsr
Copy link

Not that many concurrent requests (think <20 at a time)

@ashwinsr
Copy link

Here's one traceback:

File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1299, in _request
response = await self._client.send(request, auth=self.custom_auth, stream=stream)
File "/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/httpx.py", line 137, in send
rv = await real_send(self, request, **kwargs)
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1620, in send
response = await self._send_handling_auth(
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1648, in _send_handling_auth
response = await self._send_handling_redirects(
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1685, in _send_handling_redirects
response = await self._send_single_request(request)
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1722, in _send_single_request
response = await transport.handle_async_request(request)
File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 352, in handle_async_request
with map_httpcore_exceptions():
File "/usr/local/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
raise mapped_exc(message) from exc

@RobertCraigie
Copy link
Collaborator

Okay thanks, do you have debug logging enabled?

If you could share debug logs for openai, httpx & httpcore it would be incredibly helpful.

@ashwinsr
Copy link

Also some regular timeout errors:

File "open_ai.py", line 111, in get_function_chat_completion
response = await client.chat.completions.create(
File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1191, in create
return await self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1480, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1275, in request
return await self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1331, in _request
return await self._retry_request(options, cast_to, retries, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1362, in _retry_request
return await self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1331, in _request
return await self._retry_request(options, cast_to, retries, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1362, in _retry_request
return await self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1332, in _request
raise APITimeoutError(request=request) from err
openai.APITimeoutError: Request timed out.

@ashwinsr
Copy link

@RobertCraigie unfortunately don't have debugging logging enabled already, and turning it on now might not help much because we're likely going to have to downgrade to the old API version until we can get this figured out (because we can't just wait for our production traffic to fail...)

@RobertCraigie
Copy link
Collaborator

@ashwinsr okay no worries, I would suggest trying to explicitly close stream responses (see issue linked earlier for an example) if you can before downgrading. I'll try to figure out what's happening.

@ashwinsr
Copy link

Got it, I'll do that. What would the code snippet be to close the connection for your embedding example at the top of the page?

@RobertCraigie
Copy link
Collaborator

RobertCraigie commented Nov 11, 2023

Unfortunately you'd likely have to update your code to use raw responses, https://github.com/openai/openai-python?tab=readme-ov-file#accessing-raw-response-data-eg-headers. I would be very surprised if standard requests are a cause of this issue and it would help narrow this down if you left them as-is for now but I totally understand if you'd rather explicitly close responses there as well.

Also just to be clear, you definitely shouldn't have to explicitly close responses, I just suggested it as a temporary workaround so you don't have to downgrade.

@ashwinsr
Copy link

Alright @RobertCraigie we turned on debug logging, and left as is. I'll update here with the next set of failed logs to see if we can find the root cause.

@Liu-Da
Copy link

Liu-Da commented Nov 12, 2023

same situation

@wistanch
Copy link

Running into the same issue, had to swap the library out for direct calls to OpenAI API with aiohttp.

@RobertCraigie
Copy link
Collaborator

RobertCraigie commented Nov 12, 2023

Thank you all for the additional confirmations, can you share any more details about your setup? The following would be most helpful:

  • Python version
  • Framework / app setup (e.g. FastAPI, Flask, etc.)
  • openai-python version
  • httpx version
  • httpcore version

If anyone can share a reproduction that would also be incredibly helpful.

@RobertCraigie
Copy link
Collaborator

Also any examples of code using the openai package would be helpful.

@RobertCraigie
Copy link
Collaborator

Additionally, we did recently fix a bug related to this so please ensure you're on the latest version! v1.2.3

@RobertCraigie
Copy link
Collaborator

RobertCraigie commented Nov 13, 2023

Update: we've been able to reproduce the httpx.ReadTimeout issue but I have not been able to reproduce the pool timeout issue.

I have been able to reproduce the httpx.ReadTimeout issue while making raw requests using httpx directly so this may not be an issue with the SDK itself.

This issue may be related: encode/httpx#1171

The underlying error I get is this:

Traceback (most recent call last):
  File "/Users/robert/stainless/stainless/dist/openai-python/.venv/lib/python3.9/site-packages/anyio/streams/tls.py", line 131, in _call_sslobject_method
    result = func(*args)
  File "/Users/robert/.rye/py/cpython@3.9.18/install/lib/python3.9/ssl.py", line 889, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2633)

@RobertCraigie
Copy link
Collaborator

I've pushed a fix for the httpx.ReadTimeout issue I managed to reproduce, this will be included in the next release: #804

@RobertCraigie
Copy link
Collaborator

A fix has been released in v1.2.4! Please let us know if that fixes the issue for you.

@agronholm
Copy link

Why are you involving threads there if you're already using async?

@makaralaszlo
Copy link

makaralaszlo commented Nov 20, 2023

This is from an example of an extensive monolith application, where async functions are used for the I/O tasks, and there are CPU-heavy loads that are not necessary to be waited by the async tasks; these are sent out to a new thread. The same occurs if we use the run_in_executor() function of asyncio.

Using OpenAI 0.28.1, it worked perfectly before.

@agronholm
Copy link

The biggest problem with stucking_example.py is that it's creating multiple event loops. I cannot fathom how it ever worked before, but that must've been by chance. I suggest you use the synchronous API instead.

@makaralaszlo
Copy link

makaralaszlo commented Nov 20, 2023

Yeah, you are right, @agronholm, so the main problem in our case is not the library-based problem, just it used to work before, but I think it is possible for others in this messaging also to make such a mistake, so for them, it would be nice to check if the same conditions apply.

In the example, only one event loop will be created for a different thread. These functions are executed on separate threads using the AsyncThreadingHelper class. If both threads try to access the same instance of OpenAIAdapter concurrently, it can result in race conditions. That's the problem in our case, and it can be solved using separate instances of OpenAIAdapter for each thread or creating a thread-safe mechanism to synchronize access to the shared resource, as you mentioned. Or to create separate instances of the OpenAIAdapter.

@Inkorak, if you are referring to this library https://github.com/run-llama/llama_index there might be the same problem it utilizes the async and threading at the same time. (I didn't deep dive into the code, but it worth checking it)

@rattrayalex
Copy link
Collaborator

@makaralaszlo does your code work if you use the synchronous OpenAI client instead of AsyncOpenAI? If so, it's unlikely to be the same problem as this issue.

@zhu
Copy link

zhu commented Nov 21, 2023

Fix in v1.2.3 only works when no exception raise.
You should catch exceptions and call self.response.aclose() to release the connection.

@zhu
Copy link

zhu commented Nov 22, 2023

And upgrade the anyio to v4.0.0 may help.

@agronholm
Copy link

And upgrade the anyio to v4.0.0 may help.

With what, exactly?

@zhu
Copy link

zhu commented Nov 23, 2023

And upgrade the anyio to v4.0.0 may help.

With what, exactly?

agronholm/anyio#534

Seems some bug is fixed in 4.0.0

@billwang233
Copy link

I used openai==1.3.5 to execute model gpt-3.5-turbo-0613 and new model gpt-3.5-turbo-1106 in parallel.

But, I got different test results!!!

I set the parallelism to 20. Model gpt-3.5-turbo-1106 may occur several times of read timeout, while Model gpt-3.5-turbo-0613 does not.

Even though I set the parallelism to 50, Model gpt-3.5-turbo-0613 still hasn't occurred any times of read timeout.

My guess is it a problem with Model gpt-3.5-turbo-1106?

@billwang233
Copy link

By the way, I test with SyncClient.

@agronholm
Copy link

And upgrade the anyio to v4.0.0 may help.

With what, exactly?

agronholm/anyio#534

Seems some bug is fixed in 4.0.0

Unless I'm badly mistaken, none of the fixes in v4.0.0 have any bearing on this particular issue. As for @makaralaszlo 's example, they were having trouble due to using the async client wrong, and upgrading AnyIO will not change that.

@zhu
Copy link

zhu commented Nov 24, 2023

And upgrade the anyio to v4.0.0 may help.

With what, exactly?

agronholm/anyio#534
Seems some bug is fixed in 4.0.0

Unless I'm badly mistaken, none of the fixes in v4.0.0 have any bearing on this particular issue. As for @makaralaszlo 's example, they were having trouble due to using the async client wrong, and upgrading AnyIO will not change that.

You're right. I can reproduce this bug in anyio 4.0.0.
I found that gpt-3.5-turbo response for some prompt generate one chunk cost more than 150s. Retry cannot fix it.

@krrishdholakia
Copy link

krrishdholakia commented Nov 30, 2023

did retrying the request fix it for anyone? I'm running a load test and seeing this issue.

Script: https://github.com/BerriAI/litellm/blob/8c1a9f1c4eeba21cd535e45cf8c7600b98635fce/litellm/tests/test_profiling_router.py#L4

@linchpinlin
Copy link

Same question.
I am using AsyncAzureOpenAI. the rough logic is: pd.Series.map(lambda x: asyncio.run(gpt_4(x))).
The gpt_4 function will call the api of the GPT4 model about 3-6 times.
After the first row of Series is processed successfully, the first call to the GPT4 api from the second row reports a Connection Error. Because I have a retry mechanism, but experiments show that after experiencing a Connection Error, all subsequent requests time out and none of them succeed. In the azure control panel (metrics) it also shows no successful requests.
When I switched to AzureOpenAI, all the problems disappeared.
openai==1.3.6
Single process

@rattrayalex
Copy link
Collaborator

I'm pleased to report this bug has been fixed in the API! Connections should no longer time out while waiting to send response headers. 🎉

Anyone who has downgraded to v0.28.1 for this reason should be able to upgrade back to the latest version.

@rattrayalex rattrayalex unpinned this issue Dec 4, 2023
@antont
Copy link

antont commented Dec 4, 2023

I'm pleased to report this bug has been fixed in the API! Connections should no longer time out while waiting to send response headers. 🎉

Is it this commit? 7aad340

@RobertCraigie
Copy link
Collaborator

@antont no, this bug was an API level issue and the OpenAI team managed to figure out the underlying cause in their server.

Users reporting that downgrading the SDK version fixes the issue was a red herring, we were able to reproduce the issue in a myriad of different situations, using aiohttp (what the v0 SDK uses), using anyio directly instead of httpx, using separate languages like Rust & Node.js etc

That commit does fix a separate bug where if we retried requests then we never closed the original request which leads to a memory leak and eventually makes the client unusable as the connection limit is reached.

@jamesev15
Copy link

@RobertCraigie, I understand that the error occurred on openai servers, which means I have to contact azure to apply the same fix on their servers?. I have a gpt deployment on my azure subscription.

@matteo-giacomazzi
Copy link

Hello,

I'm experiencing this strange behavior when I use AsyncOpenAI:

2023-12-13 19:52:40 - DEBUG - receive_response_headers.started request=<Request [b'POST']>
2023-12-13 19:52:41 - DEBUG - receive_response_headers.failed exception=CancelledError()

It appears when I call the completions.create method, sometimes at the very first call, sometimes on the subsequent ones.
This raises a couple of issues:

  1. the request has aborted
  2. no exception is thrown, the program keeps hanging waiting for a create method that never returns.

I've just upgraded all the packages I have so there should be no available fix that I'm missing, is it somehow related to the problem in this thread?

Thank you,
Matteo

@rattrayalex
Copy link
Collaborator

@matteo-giacomazzi that sounds like an unrelated issue and should be addressed separately (though my suspicion is that you may be closing the client, or the response, unintentionally).

@matteo-giacomazzi
Copy link

@matteo-giacomazzi that sounds like an unrelated issue and should be addressed separately (though my suspicion is that you may be closing the client, or the response, unintentionally).

I think you're right because the problem appears only when I use the API together with MattermostDriver (the idea is to have the bot accessible via Mattermost) so I guess the source of the problem comes from there as I'm unable to reproduce the behavior on a dedicated process that doesn't use any other asyncio features.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.