Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure.monitor.opentelemetry - monitor - exporter - configure_azure_monitor hangs #33441

Closed
mathrb opened this issue Dec 8, 2023 · 8 comments · Fixed by open-telemetry/opentelemetry-python-contrib#2119
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@mathrb
Copy link

mathrb commented Dec 8, 2023

  • azure.monitor.opentelemetry:
  • 1.1.1:
  • Ubuntu 22.04.3:
  • 3.10.12:

Describe the bug
configure_azure_monitor blocks the code and diplays some exception messages:
Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x7f9698b12170>, ignoring

To Reproduce
Steps to reproduce the behavior:

  1. Create the following script:
from azure.monitor.opentelemetry import configure_azure_monitor

configure_azure_monitor(connection_string="<<appinsight connection string>>")
print("done")
  1. Done is never printed, and the following messages are displayed every ~5 minutes
    Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x7f9698b3d5a0>, ignoring

Here are the endpoints in the connection string:
IngestionEndpoint=https://francecentral-1.in.applicationinsights.azure.com/;LiveEndpoint=https://francecentral.livediagnostics.monitor.azure.com/

Expected behavior
Configuration ends normally.

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 8, 2023
@mathrb mathrb changed the title azure.monitor.opentelemetry - configure_azure_monitor hangs azure.monitor.opentelemetry - monitor - exporter - configure_azure_monitor hangs Dec 8, 2023
@xiangyan99 xiangyan99 added Service Attention Workflow: This issue is responsible by Azure service team. and removed needs-team-triage Workflow: This issue needs the team to triage. labels Dec 8, 2023
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 8, 2023
Copy link

github-actions bot commented Dec 8, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jeremydvoss @lzchen.

@Anakin100100
Copy link

I'm experiencing the same issue

@Anakin100100
Copy link

@xiangyan99 is anybody from Microsoft working on this?

@Anakin100100
Copy link

@mathrb does this affect every version or just some?

@jeremydvoss
Copy link
Member

jeremydvoss commented Dec 12, 2023

  1. Is this happening locally, on App Service, or in another environment?
  2. You can disable the VM resource detector using an OpenTelemetry environment variable. See the README's Usage section on details. Let me know if the function hangs even without the Resource Detector.

I am unable to reproduce this issue. That script works fine for me locally. As the message says, whatever error is taking place is ignored and should not cause an issue. You can see that code here. My guess is there is something else causing OpenTelemetry setup to repeat.

Are you using any OpenTelemetry environment variables to configure?
What version of opentelemetry-sdk/api do you have installed?

@mathrb
Copy link
Author

mathrb commented Dec 13, 2023

Hello @jeremydvoss
This happens locally. When I deploy a really simple function app with the configure_azure_monitor, it simply does not start, there's no log since during cold start we don't get any log.
Locally, I don't use any environment variable with the above script. With the function app, only the appinsight connection string is used, everything else is not related to opentelemetry.
Here are all versions installed:

opentelemetry-api==1.21.0
opentelemetry-instrumentation==0.42b0
opentelemetry-instrumentation-asgi==0.42b0
opentelemetry-instrumentation-dbapi==0.42b0
opentelemetry-instrumentation-django==0.42b0
opentelemetry-instrumentation-fastapi==0.42b0
opentelemetry-instrumentation-flask==0.42b0
opentelemetry-instrumentation-psycopg2==0.42b0
opentelemetry-instrumentation-requests==0.42b0
opentelemetry-instrumentation-urllib==0.42b0
opentelemetry-instrumentation-urllib3==0.42b0
opentelemetry-instrumentation-wsgi==0.42b0
opentelemetry-resource-detector-azure==0.1.0
opentelemetry-sdk==1.21.0
opentelemetry-semantic-conventions==0.42b0
opentelemetry-util-http==0.42b0

@johnliu55-msft
Copy link

johnliu55-msft commented Jan 8, 2024

I also encountered this issue and I think the root cause is how OpenTelemetry handles resource detection, where it uses ThreadPoolExecutor with future.result(timeout) to aggregate data from multiple detectors. Those futures will keep running even though timeout has occurred and captured by future.result(), so the context created by with ThreadPoolExecutor will not exit until all the futures have completed.

The Azure VM resource detection tries to retrieve data from endpoint http://169.254.169.254/metadata/instance/compute?api-version=2021-12-13&format=json, and on my env this would take 2-3 minutes to fail eventually, thus blocks the whole OpenTelemetry configuration.

The workaround for me now is setting environment variable OTEL_EXPERIMENTAL_RESOURCE_DETECTORS to otel to override the default:

export OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=otel

I tried set it to an empty string but that breaks the code. After reading the code here, I think it's safe to set the value to otel for now as it will be added by OpenTelemetry if missing.

@jeremydvoss
Copy link
Member

The issue stems from an unclear timeout in the OTel SDK. My fix will be in the next release. In order to not trigger the 5 second timeout, the VM Resource Detector now sets its own timeout to 4s. Please update to opentelemetry-resource-detector-azure=0.1.3

@github-actions github-actions bot locked and limited conversation to collaborators Apr 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants