Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: refactor value manipulations #375

Closed
wants to merge 28 commits into from

Conversation

GalacticHypernova
Copy link
Contributor

Types of changes

  • Bug fix (a non-breaking change which fixes an issue)
  • New feature (a non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Enhancement (a non-breaking change which improves functionality like performance)

Description

Having many manipulations (especially when unnecessary) can be really detrimental to performance. This PR aims to refactor, wherever possible, said iterations.

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes (if not applicable, please state why)

Not necessary as it doesn't affect functionality.

Copy link

vercel bot commented Feb 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
nuxt-security ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 21, 2024 6:45pm

@GalacticHypernova GalacticHypernova marked this pull request as draft February 13, 2024 17:54
@GalacticHypernova
Copy link
Contributor Author

Kept the old code commented for ease of revert in case necessary.

@GalacticHypernova
Copy link
Contributor Author

I will change all the reduce, just doing it to test.

@GalacticHypernova GalacticHypernova marked this pull request as ready for review February 13, 2024 19:15
@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Feb 14, 2024

I will also fix the error of course, it's a bit odd CI results appear so late.

@vejja
Copy link
Collaborator

vejja commented Feb 14, 2024

I will also fix the error of course, it's a bit off CI results appear so late.

Thanks ! CI is not triggered automatically on patch branches, we have to launch it manually.
As a preliminary debugging step you can yarn test locally to check your changes against the test suite.

@GalacticHypernova
Copy link
Contributor Author

Ah, that makes sense

@GalacticHypernova GalacticHypernova marked this pull request as draft February 14, 2024 21:30
@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Feb 20, 2024

@GalacticHypernova CI Output Capture d’écran 2024-02-14 à 10 45 43 Let me know if you can see the details or if you need a copy of the logs

I can see them from the actions tab, thanks!

@Baroshem
Copy link
Owner

Baroshem commented Feb 21, 2024

@GalacticHypernova

I really appreciate your work buddy!

Do you have these benchmarks somewhere to compare performance between the main branch and yours?

I am thinking if it will actually make much difference. I know that in general classic for loops are faster than built in array methods but the difference can be observed mainly when working with datasets that contain milions of rows.

The optimizations made by @vejja for the cheerio made sense because they were traversing through the generated HTML many times which was decreasing the performance but in the case of your PR it may not even be noticable. Unless there is a benchmark that proves I am wrong :)

@GalacticHypernova
Copy link
Contributor Author

@Baroshem
Thanks, I'm always glad to help!

I have indeed checked every update I pushed with multiple benchmarks, both on node and on browsers. You're correct it may not be that significant in the short term compared to the cheerio update, but it's an opotimization that will add up at runtime in the long run, benefitting not only the module, but also the users' applications in terms of performance. I can attach some benchmark results if you'd like to see the difference it made.

@Baroshem
Copy link
Owner

Yes please do so!

We will take a look at it with @vejja :)

@GalacticHypernova
Copy link
Contributor Author

Sure thing!
I'll attach them when I get the chance, as I'm not on my PC currently and that's where all my benchmarks are.

@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Mar 21, 2024

@Baroshem Heya! I really apologize for the long disappearance, got caught up with personal life and haven't had much time to work on this, I have the benchmark results here:
image
image

Here's the benchmark code (ran separately to avoid any sort of engine level caching):

const bigObject = {}
for(let i = 0; i < 1; i++){
	bigObject[i] = i+1
}
console.time("bigObjectAsArray")
for (let i = 0; i < 1; i++) {
	const bigObjectAsArray = Object.entries(bigObject).map(([key, value]) => [value, key])
}
console.timeEnd("bigObjectAsArray")



console.time("bigObjectAsObject")
for (let i = 0; i < 1; i++) {
	const bigObjectAsArray = []
	for(const key in bigObject){
		bigObjectAsArray.push([bigObject[key], key])
	}
}
console.timeEnd("bigObjectAsObject")

As you can see, there is a very clear performance difference even on a single iteration in a single test loops. This differs further as the objects become larger and ran through more iterations.

@GalacticHypernova
Copy link
Contributor Author

I will also refactor this PR in a new one that I will open soon, because I clearly messed something up right here.

@vejja
Copy link
Collaborator

vejja commented Mar 22, 2024

Hi @GalacticHypernova

To be very honest (and I'm not sure you'll like it that much), I am not convinced that we should sacrifice code stability for marginal optimization at this stage of the project.

There are many core features in the module that are probably going to change significantly (I'm thinking integrating runtime hook events within the core radix router), and I think it would be safer to stabilize these parts of the code before undergoing loop optimization. Although you're right on single line performance timings, we don't loop through that many objects and our objects are not large at all, so overall the impact is not even going to be noticeable.

However, there is one key part of the code where we would really need help in performance/optimization, which is the Cheerio parser. The performance gains are likely to be huge, so if you would accept to have a look at it I think it would be truly fantastic.

Would you have a look at issue #341 ? OP reports that response time went from 14ms to 200-500ms. So if we were able to bring it back down into the 20-50ms range, that would be an amazing 10x performance gain ! In addition we also got reports of CPU/memory crashes in Cloudflare Workers.

We know exactly what the issue is: Cheerio is very very slow. And we also know what the solution is: use regexes instead of Cheerio. This analysis is based on the fact that OP is comparing Nuxt-Security v0.13 (which used regexes) with Nuxt-Security v1.0 (which uses Cheerio).

Would you by any chance happen to be comfortable with regexes? Let me know your feedback - sorry to be a bit directive here but I think your focus on performance would be highly beneficial to the module.
Cheers

@Baroshem
Copy link
Owner

Hey @GalacticHypernova

First of all, thanks for the amazing work! You really did a lot of work to benchmark the performance.

But I have to agree on @vejja on this one. I would prefer not to change so many things in the code to achieve such small performance improvement. It is noticable but changing so much can result in unwanted bugs that I would prefer to avoid at all.

But as Vejja mentioned, there are for sure opportunities to significantly improve the prrformance without actually rewriting the code of the module like the example with cheerio. If you would like to cintribute, I am sire this is a great place to start. It would also allow to remove dependency from the module.

This would help the module a lot!

@GalacticHypernova
Copy link
Contributor Author

GalacticHypernova commented Mar 23, 2024

@vejja @Baroshem
First things first, thanks for the feedback!
While I would have to disagree about the bug aspect due to the fact tests could be made to avoid regression, I do understand your point. I agree this has less immediate impact on performance (though it does add up at runtime in the long run) and therefore less immediate priority, so I'm fine with postponing this until the major issues are taken care of. I'll gladly try to help with cheerio.

@vejja
Copy link
Collaborator

vejja commented May 10, 2024

Closing in favor of #404 by @GalacticHypernova

@vejja vejja closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants