Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large entities causing memory overhead #2348

Closed
leesiongchan opened this issue Oct 5, 2017 · 5 comments
Closed

Large entities causing memory overhead #2348

leesiongchan opened this issue Oct 5, 2017 · 5 comments

Comments

@leesiongchan
Copy link

leesiongchan commented Oct 5, 2017

Recently I tried to create a custom-source-plugin to fetch data from my API and everything works great except for once I adjusted to recursively fetch data from every page, the array size getting bigger and memory reach out to more than 1gb, and after when it ready to createNode, the memory continues to increase exponentially until the app burst! So my question is, do we really have to preload everything?

If yes, how can I improve the performance and efficiency? Or there any way to dynamically fetch the necessary data based on the request?

@leesiongchan
Copy link
Author

leesiongchan commented Oct 5, 2017

Do you any timeline in mind for live source fetching? So gatsby will become any application generator instead of static only, currently our application is more like an ecommerce site, I think we might not be able to use gatsby for this case. But I really love gatsby's concept, it's really beautiful.

@KyleAMathews
Copy link
Contributor

There's probably some low-hanging fruit for increasing efficiency — improving Gatsby's scalability will be a focus towards the latter part of this year and next year.

Currently though, it sounds like you're just running into Node's built-in memory limits. If you run Gatsby like node --max_old_space_size=4096 ./node_modules/.bin/gatsby you'll have a lot more memory to work within.

Preloading is by far the simplest way to do things and arguably the best as development & builds are much faster when data is local & it's easy for Gatsby to autogenerate the GraphQL schema. There are other harder ways of not making data local but it's not something that's been explored much.

@jasonphillips
Copy link
Contributor

On a related point, is there no standardized way for a plugin (source plugin, I suppose) to extend the Graphql schema / resolvers directly, without simply adding preloaded nodes to the tree?

In other words, a way to permit a custom graphql resolve logic for part of the schema -- but where it would still be executed and cached during the build time, not as some kind of live query.

@KyleAMathews
Copy link
Contributor

There is https://www.gatsbyjs.org/docs/node-apis/#setFieldsOnGraphQLNodeType

It's generally suggested you use this only for adding fields that you want to have arguments (e.g. the "excerpt" field on "MarkdownRemark" let's you pass in a pruneLength variable to control the creation of the excerpt) or when you want to do custom processing (e.g. transformer-remark let's you create image thumbnails using GraphQL).

I think the right solution to this problem of "too much data" is a way to pull data fetching & schema creation into another process w/ a DB backing the data instead of everything being in memory. Watch this space :-) working on a hosted version of this. This way there's essentially no limit to the amount of data Gatsby can handle.

@KyleAMathews
Copy link
Contributor

Hey, closing out old issues. Please re-open if you have additional questions, thanks!

Also, check out v2! We've vastly reduced memory usage + build speed in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants