Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Pipelines Parent Scaler not scaling (2.10) #4387

Closed
Tracked by #4374
Eldarrin opened this issue Mar 16, 2023 · 28 comments
Closed
Tracked by #4374

Azure Pipelines Parent Scaler not scaling (2.10) #4387

Eldarrin opened this issue Mar 16, 2023 · 28 comments
Labels
bug Something isn't working

Comments

@Eldarrin
Copy link
Contributor

Eldarrin commented Mar 16, 2023

Report

Just upgraded from 2.9 to 2.10 and scaling has stopped working. Will work it out and update but putting a pin here for now.

Expected Behavior

Pipelines to Scale

Actual Behavior

Pipelines don't scale

Steps to Reproduce the Problem

  1. Setup AzDO Scaler using parent method on keda v2.9.1
  2. Check scaledjobs scale based on parent
  3. Upgrade keda to v2.10.0
  4. Jobs using parent method will not scale

Logs from KEDA operator

Logs have no difference between version

KEDA Version

2.10.0

Kubernetes Version

1.25.5

Platform

Microsoft Azure

Scaler Details

azure-pipelines

Anything else?

When calling */jobrequests?top=nnn returns a different api result to */jobrequests. For instance, different number of returns and no matchedAgent field. But json is sufficiently similar to not fail marshalling.

@Eldarrin Eldarrin added the bug Something isn't working label Mar 16, 2023
@JorTurFer
Copy link
Member

Hi,
This is weird because we have the scaler covered with e2e tests. Maybe is a not covered case? How are you using it? Maybe we could add the use case to the e2e test to cover better the scaler

@JorTurFer JorTurFer moved this from To Triage to To Do in Roadmap - KEDA Core Mar 17, 2023
@Eldarrin
Copy link
Contributor Author

Eldarrin commented Mar 17, 2023

Well, I know what it is, I just don't know why it is :) It's adding $top to the url query. Will investigate further as it only seems to be affected the matchedAgents field which is used by parent method only

Its an interesting one. If you have access to AzDO can you just execute the jobrequests api with and without the $top query. Even the count changes (and not in line with top). I'd be interested if you see it too. Ideally if you can have a job pending with no agent to run it also shows the matchedAgent difference.

@Eldarrin
Copy link
Contributor Author

@JorTurFer not sure what we want to do here. It seems to me like the quick and dirty fix is to exclude top when using parent metadata and add an e2e test to it. Would need to dig into API versions I suspect to work out a true fix,

Any thoughts? Basically without the patch 2.10 is unusable if using parent

@shayki5
Copy link
Contributor

shayki5 commented Mar 19, 2023

Hi, I added the top the API in order to solve another bug...
It looks like you're right, in this case the matchedAgents doesn't exist in the response (Microsoft 😡).
@JorTurFer Should I just update it what @Eldarrin suggested? to exclude top when using parent?

@Eldarrin
Copy link
Contributor Author

I'm writing a bunch of e2e tests at the moment to handle the more extreme cases using demands and parent type scalers. We should probably aim to coincide with that

@shayki5
Copy link
Contributor

shayki5 commented Mar 19, 2023

Ok great, so can you add the condition?

@Eldarrin
Copy link
Contributor Author

Ok great, so can you add the condition?

Sure, I can put the hotfix in. @JorTurFer are you ok with that as an interim?

@tomkerkhove
Copy link
Member

Related: #4374

@EugeneLugovtsov
Copy link
Contributor

@Eldarrin @shayki5 hi, I have the following error. can it be related?

image
image

@Eldarrin
Copy link
Contributor Author

What meta type are you using? demands, parent or none? I haven't done any volume testing on it myself as I use parent and it won't load any jobs with that

@EugeneLugovtsov
Copy link
Contributor

I use demands type

@EugeneLugovtsov
Copy link
Contributor

In one agent pool, I have several agent types. so I use the introduced in the 2.10 feature requireAllDemands: "true"

@EugeneLugovtsov
Copy link
Contributor

for some reason there is no Agent.Version demand for yaml pipelines (_apis/distributedtask/pools/{agent-pool}/jobrequests response). this demandis presents only in releases.

But it's weird, it stopped working a few hours ago, it worked like a charm before

@Eldarrin
Copy link
Contributor Author

I wonder if it’s the top query flipping the response like with parent. I’ll look further into it tomorrow

@JorTurFer
Copy link
Member

Let me know @Eldarrin if you need anything from our side to enable the new e2e scenarios

@JorTurFer
Copy link
Member

Sure, I can put the hotfix in. @JorTurFer are you ok with that as an interim?

I think so, the top is a new feature for huge cases, but if the API isn't compatible we cannot do anything... Ideally, we could check the API and somehow we could solve this gap, but IDK if we can do it. @tomkerkhove @v-shenoy , could you ask internally if there is any way to achieve this?

@Eldarrin
Copy link
Contributor Author

The current e2e only deals with the base scale scenario so I’m rewriting it to a more comprehensive suite. The problem is around ADO slowness when testing /sigh . I’m hoping to get the new suite in the next day of so and retest against 2.9 and 2.10 to check for other scenarios that may need remediation. The only additions should be another pipeline and env which I’ll stick into test-tools and a new image type for the ado agent.

@shayki5
Copy link
Contributor

shayki5 commented Mar 21, 2023

@JorTurFer If the $top cause issues, maybe we can disable it by default and only if explicitly we pass a parameter what we wand it it will be added to the api call?

@JorTurFer
Copy link
Member

@JorTurFer If the $top cause issues, maybe we can disable it by default and only if explicitly we pass a parameter what we wand it it will be added to the api call?

I understood that @Eldarrin wanted to add the $top in case of not using demands. Let's give some time 😄

@Eldarrin
Copy link
Contributor Author

I've got a commit in, needs more work which I'll do tomorrow, but at least you have a start point if I drop off the face of the earth :) luckily, the tests proved it was only parent which is affected

@Eldarrin
Copy link
Contributor Author

Should be ready to run e2e on azure_pipelines and azure_pipelines_adv. Just need to mod your ado setup as per test-tools and build an adv agent or just use mine for now.

@JorTurFer JorTurFer mentioned this issue Mar 24, 2023
17 tasks
JorTurFer added a commit that referenced this issue Mar 28, 2023
#4387) (#4401)

Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Andy Ward <mortx@toothless.eldarrin.io>
@github-project-automation github-project-automation bot moved this from To Do to Ready To Ship in Roadmap - KEDA Core Mar 28, 2023
JorTurFer added a commit to JorTurFer/keda that referenced this issue Apr 13, 2023
kedacore#4387) (kedacore#4401)

Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Andy Ward <mortx@toothless.eldarrin.io>
JorTurFer added a commit to JorTurFer/keda that referenced this issue Apr 13, 2023
kedacore#4387) (kedacore#4401)

Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Andy Ward <mortx@toothless.eldarrin.io>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>
JorTurFer added a commit that referenced this issue Apr 13, 2023
…<Jorge_turrado@hotmail.es> Co-authored-by: Andy Ward <mortx@toothless.eldarrin.io> Co-authored-by: Tom Kerkhove <kerkhove.tom@gmail.com> Co-authored-by: Zbynek Roubalik <zroubalik@gmail.com> Co-authored-by: Eldarrin <32762846+Eldarrin@users.noreply.github.com> Co-authored-by: Eugene Lugovtsov <34510252+EugeneLugovtsov@users.noreply.github.com> fix(aws-sqs): Respect `scaleOnInFlight` value (#4358) fix odd number of arguments passed as key-value pairs for logging (#4369) fix: Azure Pipelines Scaler uses correct endpoint when demands are set  (#4387) (#4401) fix: respect all required demands in azure pipeline scaler (#4405) fix: Allow to remove the finalizer even if the ScaledObject isn't valid (#4397)

* Drop a transitive dependency on bou.ke/monkey (#4366)

Signed-off-by: Zbynek Roubalik <zroubalik@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* fix(aws-sqs): Respect `scaleOnInFlight` value (#4358)

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* chore: update supported versions in the welcome message (#4360)

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* fix odd number of arguments passed as key-value pairs for logging (#4369)

Signed-off-by: Zbynek Roubalik <zroubalik@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* fix: Azure Pipelines Scaler uses correct endpoint when demands are set  (#4387) (#4401)

Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Andy Ward <mortx@toothless.eldarrin.io>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* fix: respect all required demands in azure pipeline scaler (#4405)

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* fix: Allow to remove the finalizer even if the ScaledObject isn't valid (#4397)

Co-authored-by: Tom Kerkhove <kerkhove.tom@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

* update changelog

Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>

---------

Signed-off-by: Zbynek Roubalik <zroubalik@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@scrm.lidl>
Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Co-authored-by: Zbynek Roubalik <zroubalik@gmail.com>
Co-authored-by: Eldarrin <32762846+Eldarrin@users.noreply.github.com>
Co-authored-by: Andy Ward <mortx@toothless.eldarrin.io>
Co-authored-by: Eugene Lugovtsov <34510252+EugeneLugovtsov@users.noreply.github.com>
Co-authored-by: Tom Kerkhove <kerkhove.tom@gmail.com>
@JorTurFer
Copy link
Member

JorTurFer commented May 31, 2023

Hi @Eldarrin
One question about this, I have noticed that we generate this image e2e/images/azure-pipelines/advagent as part of kedacore/test-tools#118 but we are not using it at all.
Can we delete it?

@Eldarrin
Copy link
Contributor Author

Eldarrin commented Jun 2, 2023

HI @JorTurFer
Actually we should be using it, when I set it up I set it to use my image of this on docker:

This should be set to use your version of the image.

@JorTurFer
Copy link
Member

Would you open a PR with the change? (as you have done all the work and maybe you want to get the commit ownership)
I can open it if you don't have time :)

@Eldarrin
Copy link
Contributor Author

Eldarrin commented Jun 2, 2023

Submitted #4641 :)

@JorTurFer
Copy link
Member

Thanks a lot! 🙇

@Eldarrin
Copy link
Contributor Author

Eldarrin commented Jun 2, 2023

No problem, time for a beer 🍺

@zroubalik
Copy link
Member

No problem, time for a beer 🍺

Well deserved, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

6 participants