Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize XmlTextEncoder.Write #61773

Merged
merged 10 commits into from
Dec 7, 2021

Conversation

kronic
Copy link
Contributor

@kronic kronic commented Nov 18, 2021

No description provided.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 18, 2021
@ghost
Copy link

ghost commented Nov 18, 2021

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: kronic
Assignees: -
Labels:

area-System.Xml, community-contribution

Milestone: -

@kronic
Copy link
Contributor Author

kronic commented Nov 18, 2021

@eiriktsarpalis done

@kronic kronic changed the title Optimize XmlTextEncoder.WriteStringFragment Optimize XmlTextEncoder.Write Nov 18, 2021
Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jeffhandley
Copy link
Member

Hi @kronic -- thanks for this and all of your other recent contributions. As you know, XML is an area our team has not invested in for several years; your work all has the spirit of modernizing the code a bit, and presumably improving performance.

There are a few things that would help us greatly with reviewing your PRs:

  1. Understanding your motivations for making these changes -- are these optimizations being made in "hot path" scenarios in your applications, for instance?
  2. Awareness of what kind of testing you're doing with your changes -- each of these changes has a potential to cause subtle, unintended behavior changes, and the more changes we make, the more that risk adds up
  3. Knowing if you're conducting performance testing to validate these optimizations are accruing benefit in scenarios that affect you
  4. Visibility into what other changes you're planning to make after these PRs

If you could summarize all of this as a new issue, that would be best for us. The title could capture the motivation, and the description could include a task list that includes all of the changes you've already submitted PRs for, as well as any other changes you're planning to make. The description could also describe the scenarios where these optimizations are proving valuable to you.

Before we proceed with merging more PRs, I'd like for us to review this information holistically. That will allow us to better calculate the risk/reward involved. For the time being, I'm going to mark this and your other open PRs as "needs more info" and I'll link the other PRs to this comment.

Thanks again for your efforts!
Jeff Handley

@Wraith2
Copy link
Contributor

Wraith2 commented Nov 19, 2021

There are still large "legacy" codebases that make use of xml and improving perf for them in a community contribution seems like a reasonable thing to do since it isn't a customer focus issue from the MS side. As you've noted Xml hasn't really seen any sustained effort because it's not the web development flavour of the decade anymore.

In the specific case of #61774 which improves performance by using a stackalloc avoiding an allocation there should be no observable change unless you're using a profiler. That would fit into the reasoning of making the function better. Why would this need further explanation? Are community contributions in unpopular areas not welcome?

@jeffhandley
Copy link
Member

The contributions are certainly welcomed; we're grateful for them for sure! And while the majority of the effort is certainly on the contributor's (@kronic's) end, it's still nonzero for the team too--reviewing, testing, and accruing even minor risks to regressions add up in aggregate. Thus, for our own planning purposes, we need to gauge how much effort is worthwhile.

That's where having the context and motivations (ideally in measurable forms) will help us out. We can assess how far into these contributions we are and how much more is coming; that allows us to budget time for the reviews, testing, etc. appropriately. Getting the context I requested will help us avoid the situation of community contributions continuing to roll in while we fall behind reviewing them, and the PRs ultimately lose momentum, go stale, start having merge conflicts, and time is wasted. The better we can all have our expectations set, the less likely that outcome is.

@kronic
Copy link
Contributor Author

kronic commented Nov 19, 2021

@jeffhandley
Hello, my programs process hundreds of millions of xml every day.
I understand very well that you will not be investing in this area for the foreseeable future.
I also understand the risk of recourse, and that you bear the cost of considering pr, so I try to keep the maximum 1-5 open pr.

  1. The motive is simple to speed up all this, and bring the code into a more readable form for its further optimization.
  2. Testing is performed when performance changes. When you refactor your testing, you don't need to change the behavior. I'm ready to add tests to avoid regressions like #51736 I've added 30+ tests.
  3. No
  4. I plan to bring the code base into a more readable form, and speed up everything in my power.
    List of changes I made prs

How many pr are you willing to accept per month?

@eiriktsarpalis
Copy link
Member

How many pr are you willing to accept per month?

I don't think this is an issue of volume of concurrently open PRs. This is more about defining a mission statement so we can better understand the motivation and risk-to-benefit ratio. Making the codebase "more readable" or "more modern" does not meet the bar in its own right. And any claims to improving performance should be backed by benchmarks demonstrating significant improvement in common scenaria (and does not regress other scenaria).

I wouldn't want to discourage small community PRs that make small improvements. However layering a large number of small PRs would make it difficult for us to diagnose any potential regressions that get picked up when we ship .NET 7. Going forward I would recommend taking the following steps:

  • Create an issue detailing (at a high level) what improvements you would like to make and why. We can then have a conversation about what changes we want to support and which we think are not worth your investment.
  • Submit large PRs that address specific high-level concerns in System.Xml project-wide. These should come with descriptions detailing background and motivation. If the motivation is improving performance, it should come with benchmark results demonstrating the improvement in common workflows. I would also recommend running the standard xml suite in dotnet/performance to ensure that there are no unintended performance regressions. For your reference, here is an example of such a PR.

@eiriktsarpalis eiriktsarpalis added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 3, 2021
@krwq
Copy link
Member

krwq commented Dec 7, 2021

@kronic I'll merge this as is because this change is fairly isolated and it does simple improvement. Since there have been some concerns on the volume of the PRs please create an uber issue (i.e. XML performance improvements) with types of changes you're planning to make (i.e. allocation improvements, perhaps split those into specific buckets you're searching for and the general approach you're taking) etc and let's discuss that before proceeding further. It would be good to have some kind of plan at least so that if issues are bucketized and we find a regression we can fix it for all similar changes. It would be good to have some kind of perf measurements on what numbers we're trying to get down

@krwq krwq merged commit fa5d7ff into dotnet:main Dec 7, 2021
@kronic kronic deleted the Optimize_XmlTextEncoder.WriteStringFragment branch December 9, 2021 14:36
@ghost ghost locked as resolved and limited conversation to collaborators Jan 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Xml community-contribution Indicates that the PR has been added by a community member needs-author-action An issue or pull request that requires more info or actions from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants