Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute: ResourceSkus_List returns 30K+ results in one API call #20470

Open
erezrokah opened this issue Mar 26, 2023 · 14 comments
Open

Compute: ResourceSkus_List returns 30K+ results in one API call #20470

erezrokah opened this issue Mar 26, 2023 · 14 comments
Assignees
Labels
Compute customer-reported Issues that are reported by GitHub users external to the Azure organization. Mgmt This issue is related to a management-plane library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Service This issue points to a problem in the service.

Comments

@erezrokah
Copy link
Contributor

erezrokah commented Mar 26, 2023

Hello 👋

This is an investigation request, as I'm not sure this falls under a bug or a feature.

Problem

I'm an engineer on the CloudQuery team, and we've been getting reports on high memory usage when using the Azure Go SDK github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v4 module, specifically the ResourceSKUsClient client.

What I think I know

After doing some investigation and profiling, I discovered the following:

  1. That specific client, while having a paginating API, returns all items in a single response (seems like the REST API works like that so not specific to the Go SDK). It's common to have around 30,000 items returned by that API, hence a spike in memory usage when retrieving it.

  2. The Azure Go SDK recursively implements a custom UnmarshalJSON method for each struct that's a part of an API client response. See example:


The custom UnmarshalJSON first converts each object into a map of string->byte[] , and then iterates over the map keys and unmarshalls each key value pair into the relevant type.

I think that since each API struct already has JSON tags, for example:

Value []*ResourceSKU `json:"value,omitempty"`

having those custom UnmarshalJSON is not needed and creates additional redundant allocations.
It seems the built in Go JSON decoder knows how to unmarshall without allocating additional memory.

I tried commenting out the UnmarshalJSON methods for the SKUs structs and it seems to improve memory and CPU (since the GC is doing less work) usage quite a bit (I can share before/after pprof images if needed).

What I'm requesting

  1. Feedback on this issue
  2. Explanation on the need to have both custom UnmarshalJSON and JSON tags
  3. If everything makes sense, removal of the custom UnmarshalJSON methods. I can even try to submit a PR for it 😅

Other context

The code that creates the custom UnmarshalJSON is in another repository, see here https://github.com/Azure/autorest.go/blob/db51bb93bef7af3b3a96e0e0550533a50fe5d29a/src/generator/models.ts#L289

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 26, 2023
@ghost
Copy link

ghost commented Mar 26, 2023

Hi @erezrokah. Thank you for your feedback and we will look into it soon. Meanwhile, feel free to share your experience using the Azure SDK in this survey.

@jhendrixMSFT jhendrixMSFT added pillar-performance The issue is related to performance, one of our core engineering pillars. and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 27, 2023
@jhendrixMSFT jhendrixMSFT self-assigned this Mar 27, 2023
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 27, 2023
@jhendrixMSFT
Copy link
Member

Thanks for reporting this. We've been aware of perf problems with the current implementation, see also #19356.

The API returning 30K+ results in one API call, even though the API is marked as paginated, appears to be a bug. We're following up internally with the compute folks to get clarity on the behavior.

You are correct that the JSON tags are unused. Removing them has been on the TODO list, maybe it's time to clean that up to remove any confusion.

We generate custom marshallers/unmarshallers to handle data not supported by the standard library including RFC1128 time formats and polymorphic values just to name a few. Granted, not all data structures contain such values. However, we realized early on that this might change over time, and in order to avoid breaking changes, we decided to always emit them (there is some debate about whether or not removing a custom JSON marshaller/unmarshaller constitutes a breaking change).

@erezrokah
Copy link
Contributor Author

Thanks for the quick response @jhendrixMSFT 🚀

Glad to know you folks are on top on things.
How about we keep this issue open until the SKUs bug is fixed, and then the memory/performance issue can be covered by #19356?

@jhendrixMSFT
Copy link
Member

Sounds great. I'll reply when I have an update on the 30K+ results.

@jhendrixMSFT jhendrixMSFT added Compute Service Attention Workflow: This issue is responsible by Azure service team. Mgmt This issue is related to a management-plane library. Service This issue points to a problem in the service. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team pillar-performance The issue is related to performance, one of our core engineering pillars. labels Mar 28, 2023
@ghost
Copy link

ghost commented Mar 28, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Drewm3, @TravisCragg-MSFT, @nikhilpatel909, @sandeepraichura, @hilaryw29, @GabstaMSFT, @ramankumarlive, @ushnaarshadkhan.

Issue Details

Hello 👋

This is an investigation request, as I'm not sure this falls under a bug or a feature.

Problem

I'm an engineer on the CloudQuery team, and we've been getting reports on high memory usage when using the Azure Go SDK github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v4 module, specifically the ResourceSKUsClient client.

What I think I know

After doing some investigation and profiling, I discovered the following:

  1. That specific client, while having a paginating API, returns all items in a single response (seems like the REST API works like that so not specific to the Go SDK). It's common to have around 30,000 items returned by that API, hence a spike in memory usage when retrieving it.

  2. The Azure Go SDK recursively implements a custom UnmarshalJSON method for each struct that's a part of an API client response. See example:


The custom UnmarshalJSON first converts each object into a map of string->byte[] , and then iterates over the map keys and unmarshalls each key value pair into the relevant type.

I think that since each API struct already has JSON tags, for example:

Value []*ResourceSKU `json:"value,omitempty"`

having those custom UnmarshalJSON is not needed and creates additional redundant allocations.
It seems the built in Go JSON decoder knows how to unmarshall without allocating additional memory.

I tried commenting out the UnmarshalJSON methods for the SKUs structs and it seems to improve memory and CPU (since the GC is doing less work) usage quite a bit (I can share before/after pprof images if needed).

What I'm requesting

  1. Feedback on this issue
  2. Explanation on the need to have both custom UnmarshalJSON and JSON tags
  3. If everything makes sense, removal of the custom UnmarshalJSON methods. I can even try to submit a PR for it 😅

Other context

The code that creates the custom UnmarshalJSON is in another repository, see here https://github.com/Azure/autorest.go/blob/db51bb93bef7af3b3a96e0e0550533a50fe5d29a/src/generator/models.ts#L289

Author: erezrokah
Assignees: jhendrixMSFT
Labels:

Compute, Service Attention, Mgmt, customer-reported, Service

Milestone: -

@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 28, 2023
@jhendrixMSFT
Copy link
Member

jhendrixMSFT commented Mar 28, 2023

For CRP folks, the operation in question is this one: https://github.com/Azure/azure-rest-api-specs/blob/main/specification/compute/resource-manager/Microsoft.Compute/Skus/stable/2021-07-01/skus.json#L43

Seeing as how the operation is marked as pageable, shouldn't the result set be spread out over multiple pages?

@jhendrixMSFT jhendrixMSFT changed the title Investigation request: High memory usage and how the SDK handles JSON unmarshalling Compute: ResourceSkus_List returns 30K+ results in one API call Mar 28, 2023
@jhendrixMSFT jhendrixMSFT removed their assignment Mar 28, 2023
@TravisCragg-MSFT TravisCragg-MSFT self-assigned this Mar 30, 2023
@ArthurMa1978 ArthurMa1978 added the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label May 22, 2023
@tadelesh tadelesh added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Dec 26, 2023
Copy link

Hi @erezrokah. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@erezrokah
Copy link
Contributor Author

Hi @erezrokah. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

Dear bot, I'm not sure which information is missing. To my understanding this issue awaits a response from the CRP folks

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Dec 26, 2023
@tadelesh
Copy link
Member

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Drewm3, @TravisCragg-MSFT, @nikhilpatel909, @sandeepraichura, @hilaryw29, @GabstaMSFT, @ramankumarlive, @ushnaarshadkhan.

Issue Details

kindly ping again.

@erezrokah
Copy link
Contributor Author

kindly ping again.

Another ping

@TravisCragg-MSFT
Copy link
Member

@erezrokah We are aware of the shortcomings of this API and are currently in the process of revamping it. We will be allowing KQL queries to be submitted with the request to help return only the data needed. We expect this to become fully available sometime next year.

@erezrokah
Copy link
Contributor Author

erezrokah commented Oct 15, 2024

Hi 👋 So the original issue I reported was about memory issues due to how the Azure SDK serializes and de-serializes JSON. I renamed this one, since the memory issues were supposed to be covered by #19356 which is now closed as stale and locked for comments.

What should we do about the JSON marshallings memory issues? Open a new issue?

@TravisCragg-MSFT
Copy link
Member

@jhendrixMSFT Can you weigh in on the current status for JSON marshallings memory issues? Would it be best to open a new issue or use this one to track it going forward?

@jhendrixMSFT
Copy link
Member

Thanks all for bringing this to my attention. I've reopened the issue and labeled it so the bot won't close it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compute customer-reported Issues that are reported by GitHub users external to the Azure organization. Mgmt This issue is related to a management-plane library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Service This issue points to a problem in the service.
Projects
None yet
Development

No branches or pull requests

5 participants