Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog v2 API fails when using formulas in query #2813

Closed
2 tasks done
tukak opened this issue May 26, 2023 · 13 comments · Fixed by #2997
Closed
2 tasks done

Datadog v2 API fails when using formulas in query #2813

tukak opened this issue May 26, 2023 · 13 comments · Fixed by #2997
Assignees
Labels
analysis Related to Analysis CRD bug Something isn't working

Comments

@tukak
Copy link
Contributor

tukak commented May 26, 2023

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

Using formulas in AnalysisTemplate with Datadog v2 API fails with API error

received non 2xx response code: 400 {"errors":["Functions are not supported in metrics data source field. Use the formulas payload."]}

To Reproduce

Use a query with formula in Datadog provider for AnalysisTemplate and set apiRevision to v2. The query works with v1.

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: rollout-test-success-rate
spec:
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: default(result, 0) < 10
      provider:
        datadog:
          apiVersion: v2
          interval: 1m
          query: |
            sum:trace.rack.request.hits{env:staging AND service:service_name AND http.status_code:5*}.as_count() /
            sum:trace.rack.request.hits{env:staging AND service:service_name AND http.status_code:2*}.as_count() * 100

Expected behavior

It should work with v2 as well, or the note in https://argo-rollouts.readthedocs.io/en/stable/analysis/datadog/ should warn about using formulas (now it says that "If you switch to v2, you will not need to change any other field aside from apiVersion.")

Version

1.5.1

Logs

time="2023-05-26T10:02:02Z" level=info msg="Taking 1 Measurement(s)..." analysisrun=rollout-test-6b986bcd5b-7.1 namespace=default
time="2023-05-26T10:02:02Z" level=info msg="Measurement Completed. Result: Error" analysisrun=rollout-test-6b986bcd5b-7.1 metric=success-rate namespace=default
time="2023-05-26T10:02:02Z" level=warning msg="Measurement had error: received non 2xx response code: 400 {\"errors\":[\"Functions are not supported in metrics data source field. Use the formulas payload.\"]}" analysisrun=rollout-test-6b986bcd5b-7.1 metric=success-rate namespace=default

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@tukak tukak added the bug Something isn't working label May 26, 2023
@tukak tukak changed the title Datadog V2 API fails when using formulas in query Datadog v2 API fails when using formulas in query May 26, 2023
@alexef
Copy link
Member

alexef commented Jun 2, 2023

Formulas also need separate named queries, so in your case it could look like this:

   provider:
        datadog:
          apiVersion: v2
          interval: 1m
          queries:
            a: sum:trace.rack.request.hits{env:staging AND service:service_name AND http.status_code:5*}.as_count()
            b: sum:trace.rack.request.hits{env:staging AND service:service_name AND http.status_code:2*}.as_count() * 100
          formula: a/b

this is NOT supported yet by argo-rollouts, I will update the docs accordingly.

@alexef
Copy link
Member

alexef commented Jun 2, 2023

created: #2819

zachaller pushed a commit that referenced this issue Jun 6, 2023
Update datadog.md - clarify formulas

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>
@kostis-codefresh kostis-codefresh added the analysis Related to Analysis CRD label Jun 7, 2023
@tico24 tico24 assigned tico24 and unassigned tico24 Jul 13, 2023
@meeech
Copy link
Contributor

meeech commented Jul 13, 2023

I'm interested in grabbing this one if no one has yet

@joey100
Copy link

joey100 commented Jul 17, 2023

Formulas also need separate named queries, so in your case it could look like this:

   provider:
        datadog:
          apiVersion: v2
          interval: 1m
          queries:
            a: sum:trace.rack.request.hits{env:staging AND service:service_name AND http.status_code:5*}.as_count()
            b: sum:trace.rack.request.hits{env:staging AND service:service_name AND http.status_code:2*}.as_count() * 100
          formula: a/b

this is NOT supported yet by argo-rollouts, I will update the docs accordingly.

May I know any plan to support this, when will we be able to use this? This is important if we switch to api version v2, there are lots of functions and formulas will be used in many datadog metrics queries. Otherwise we won't be able to use api version v2.

@meeech
Copy link
Contributor

meeech commented Jul 17, 2023

@joey100 just picked up the ticket the other day, so I can't give you a timeframe. that being said, one reason I picked up the ticket is we rely on formulas as well and would like to move to v2.
Will add details once i have some to share

@Sineaggi
Copy link
Contributor

Related, would it also make sense to use the datadog-api-client-go library to make our calls to Datadog?

@meeech
Copy link
Contributor

meeech commented Aug 6, 2023

@Sineaggi I'll look into that as well and see if it makes sense

@meeech
Copy link
Contributor

meeech commented Aug 29, 2023

Adding more context: I will follow up on this with datadog.

The reason we think v1 is deprecated is based on @alexef expeirence
(from slack)
"I don’t have a link, we were getting 429 throttled by the API, and when asking for a rate limit increase, we were told to switch to v2 as they can increase the rate there, but not on v1"
"in communication to datadog support they referred to v1 as the “legacy one”, that’s all I have"

I will follow up with DD, since when working with their go lib, I get warning "2023/08/29 13:21:49 WARNING: Using unstable operation 'v2.QueryTimeseriesData'" so I want to get all this info sorted so we have clear guidance.

@lxsmrnv
Copy link

lxsmrnv commented Oct 11, 2023

What is the status of this issue?

@meeech
Copy link
Contributor

meeech commented Oct 12, 2023

@asmyrnov360 PR is ready and itching to be reviewed :D

zachaller added a commit that referenced this issue Oct 14, 2023
* Add note in CONTRIBUTING.md that I would have found useful.

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Fix gen-openapi to be more portable - make sure it includes the GOPATH in the call.

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Update docs.

* Expand working with v2 information
* Contacted Datadog to get latest info re: v1 deprecation, api limits.
* Add tips about rate limits, using helm for templates
* Add more example templates

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Update Datadog Analysis Type

* Make Query omitempty since now possible it won't exist
* Add some descriptions
* Add new properties we need for v2

Queries: We can pass in key:query for queries
Formula: Makes formulas using the keys from queries

* Defaults!
Use annotations to declare defaults for some fields. This lets us remove some guard rails from the code itself

Interval: 5m - Move this from code to here
ApiVersion: v1 - Move this from code to here

* Enums!
Much like defaults, having enums lets us make assumptions about the incoming metric so we dont need as many guardrails.

ApiVersion: Enum to restrict to v1 or v2
Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Output of make codegen

Everything looks ok.

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Pass in metric to provider factory. Validate metric.

Validating the metric on initialization, rather than spread out throughout. You get earlier feedback if you have a bad metric defined. (Not perfect, but there's limitations with our annotation generator for the rules in the crd. eg: If we could use oneOf, we wouldn't need a lot of this validation)

We check all the mutually exclusive props.
The props where one requires another.
We don't have to check for defaults and set them anymore, since they are guaranteed by the crd.

rules:

- ensure we have only query OR queries
- restrict v1 to query only
- make sure you only provide a formula with queries
- make sure multiple queries are accompanied by a formula

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Remove DefaultApiVersion, remove impossible  AnalysisPhaseError

ApiVersion is guaranteed to have value, and the enum ensures its v1/v2 when user provided.

Updated v1 tests to reflect some of these new realities

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* extract urlBuilding from run

run was getting a bit long according to the checks

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Remove some unnecessary stuff for interval.

It is a straightline to initialize since default is set to 5m for incoming metrics where it is not set.

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Update createRequest

Split into createRequest v1/v2
v1 : pretty much unchanged. just extracted
v2: support for v2/query/scalar

We don't need all the timeseries. I did some testing fetching both scalar and timeseries, and they pretty much lined up.

Also confirmed with DD: From support ticket with DD: "...I have also tested the scalar api endpoint with the last aggregator as well as the timeseries api endpoint. They do indeed return the same values when retrieving the values via the api endpoints. Observing the time it takes to retrieve the values, they remain relatively the same..."

re: query + v2: Keep backwards compat. if we get in a query, we turn it into a queries object to pass on to requestv2 queries into the QueriesPayload.
Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* Handle v2 scalar responses

* update the datadogResponseV2 for scalar values
* handle no results so it has parity with v1 - empty will now usually result in `[]` unless something goes very wrong on dd side

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* update v1 no data tests to better reflect reality

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* update v2 tests

* add some new test cases
* update mock server to handle queries / formulas validation
* update no data tests to reflect reality
* stop all values being the same. it makes it difficult to know find which test case failed. move meaning from comments into the metric name.
* stop using deprecated ioutil

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>

* re-codegen

Signed-off-by: zachaller <zachaller@users.noreply.github.com>

* fix lint

Signed-off-by: zachaller <zachaller@users.noreply.github.com>

---------

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
Signed-off-by: zachaller <zachaller@users.noreply.github.com>
Co-authored-by: zachaller <zachaller@users.noreply.github.com>
@dieend
Copy link

dieend commented Feb 13, 2024

Hi, when will this change included in a release?

We're eager to use it and would love it in the next available version, otherwise we have to fork the repo and build our own image

@meeech
Copy link
Contributor

meeech commented Feb 13, 2024

It should go out with 1.7

@lxsmrnv
Copy link

lxsmrnv commented May 28, 2024

Any news when this is expected to be available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Related to Analysis CRD bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants