Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: only run generate code once #3436

Merged
merged 2 commits into from
Dec 18, 2024
Merged

Conversation

Brookke
Copy link
Contributor

@Brookke Brookke commented Dec 14, 2024

in issue #2526 it's noted that codegen.GenerateCode is ran twice theres a vague PR that reverted this duplication #1100, but I am not clear why.

In my testing using a particularly heavy schema file I found the following

# Build of the master branch
/Users/brooke/Developer/gqlgen/gqlgen  34.45s user 4.15s system 145% cpu 26.486 total

# Build of this PR
/Users/brooke/Developer/gqlgen/gqlgen  26.82s user 3.92s system 151% cpu 20.223 total

I have:

  • Added tests covering the bug / feature (see testing)
  • Updated any relevant documentation (see docs)

closes: #2526

@coveralls
Copy link

coveralls commented Dec 15, 2024

Coverage Status

coverage: 73.903% (+0.02%) from 73.884%
when pulling 5c610d3 on Brookke:address-2526
into e92956a on 99designs:master.

@Brookke Brookke marked this pull request as draft December 15, 2024 18:36
@Brookke
Copy link
Contributor Author

Brookke commented Dec 15, 2024

Converted to draft while I investigate the CI errors

@Brookke
Copy link
Contributor Author

Brookke commented Dec 15, 2024

Hi @StevenACoffman I may need some pointers on this, the diff is all in the federated example. It's added obj fedruntime.Entity to a bunch of entity resolvers. Has this highlighted a previously unknown bug where the second run was overwriting things, or was the old behaviour correct?

@StevenACoffman StevenACoffman added the help wanted Extra attention is needed label Dec 16, 2024
@Brookke
Copy link
Contributor Author

Brookke commented Dec 17, 2024

Hi @StevenACoffman I may need some pointers on this, the diff is all in the federated example. It's added obj fedruntime.Entity to a bunch of entity resolvers. Has this highlighted a previously unknown bug where the second run was overwriting things, or was the old behaviour correct?

I believe the issue was due to the order of running things. The plugins modify the data that gets passed to the main generateCode function. So by reordering things, the codegen now matches the examples

@Brookke Brookke marked this pull request as ready for review December 17, 2024 10:31
@StevenACoffman
Copy link
Collaborator

Hey, sorry I've been slow to respond to this. I really appreciate you taking this on! This looks perfectly sound to me, but I would like to get some more people to try this out in case there's a reason I'm not seeing why something would need to run both before and after (more like a middleware). After all Hyrum's Law may apply:

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

Hyrum Wright, 2012

@Brookke
Copy link
Contributor Author

Brookke commented Dec 17, 2024

No problem at all @StevenACoffman. Thanks 🙏

@vikstrous
Copy link
Contributor

vikstrous commented Dec 18, 2024

This won't cause an issue for dataloadgen. dataloadgen exists only at run time, not generation time. It's not a plugin

I got some data points for our biggest graph, though, to show that it helps us a lot too:

Version Run 1 (seconds) Run 2 (seconds)
gqlgen v0.17.60 46.81 48.23
this PR 35.52 30.46

@StevenACoffman
Copy link
Collaborator

StevenACoffman commented Dec 18, 2024

Ok, I'm going to merge this 🤞 , but I'm also going to hold off on a new release for a bit in case there's some late feedback and we need to alter things. BTW, this might help with #3414

Unfortunately, very few people pay attention to changes until after a release and they try to put things into their own prod environments.

@StevenACoffman StevenACoffman merged commit fbe0246 into 99designs:master Dec 18, 2024
17 checks passed
@golanglemonade
Copy link

@StevenACoffman Look good on https://github.com/theopenlane/gqlgen-plugins against our main schema. Appreciate you checking!

@Warashi
Copy link

Warashi commented Dec 19, 2024

@StevenACoffman
Thank you for mentioning!
I ran the compgen's test with the gqlgen's master branch, and it seems good.

@Brookke
Copy link
Contributor Author

Brookke commented Dec 22, 2024

@vikstrous is that schema available publicly? I'm looking into if there are any other obvious improvements and would like more schemas to test against.

So far I've found that this prune function takes a significant amount of time on our schema.

@vikstrous
Copy link
Contributor

No, sorry, it's a private codebase. I've noticed that autobind is one source of slowness because of the size of the repo. There are a lot of dependencies, so loading the packages can take time

@StevenACoffman
Copy link
Collaborator

You might look at #3372

@Brookke Sidenote: I'm not sure if it's helpful, but I shared with you a private repo with just Khan Academy's gqlgen configs, service schemas, and federation composed supergraph. We're a non-profit, so we don't have trade secrets, and we only accept the queries from our safelist, so I'm not super concerned with sharing. However, it would be better if it wasn't passed around too much, since that kind of thing invites attackers.

The really huge gqlgen schemas I'm aware of like ones the reddit folks and uber folks use are not public.

Honestly, one of the things that gqlgen is really missing is a set of realistic benchmarks and profiling to get a sense of where are the performance bottlenecks (during codegen but also in runtime), so this would be hugely valuable work. Performance optimizations can make the code harder to maintain, so we only want to do it where it makes a big difference, and right now we don't know where that would be.

@Brookke
Copy link
Contributor Author

Brookke commented Dec 22, 2024

Thanks for sharing @StevenACoffman. My initial findings centre around the Prune function, basically it calls imports.Process on all the generated files. generated.go can get pretty large which mean it takes a long time. Taking out the call to Prune significantly increases the speed.

This suggestion might help #2308

Alternatively, I was wondering if a refactor could move the call to imports.Process to the end of the codegen and then run it in parallel across all the files it gets called on. Would potentially be really useful in conjunction with layout: follow-schema

@StevenACoffman
Copy link
Collaborator

A PR for either/both would be very welcome!

If there's a breaking change necessary in the name of progress, we should put a config option and let people opt-in (or at least opt-out).

For instance, if the generated files were not formatted, people would be annoyed as their linters would complain (even though they can exclude generated files, or add a pre-lint format step after generation), even though it would be much faster.

@StevenACoffman
Copy link
Collaborator

If you see any other userful performance improvement suggestions in issues and want to call them out to me here, I can tag them as performance for others.

I prioritize paying attention to PRs with my volunteer free time, but I often have stretches of time where I have to ignore issues (like camping) and then can never get enough time to circle back to look at them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why is GenerateCode called twice?
6 participants