-
-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large data sets: 1.x has issues that 0.11.1, 0.12.1 do not #2226
Comments
Wow, that definitely wins for one of the larger sites/datasets I've seen in Eleventy! You mentioned npm info @11ty/eleventy time --json | grep -Ev "(canary|beta)" | tail -5
"0.11.1": "2020-10-22T18:40:22.846Z",
"0.12.0": "2021-03-19T19:24:27.860Z",
"0.12.1": "2021-03-19T19:55:13.306Z",
"1.0.0": "2022-01-08T20:27:32.789Z", |
@pdehaan trying that now 👍 p.s. the data isn't exactly confidential, it's just more of a "I don't wanna have to spam up the git repo" thing :P |
Oh, no worries. I totally don't want to download 2.7 GB of data unless… nope, I just really don't want to download roughly 1989 floppy disk's worth of data. Although now I kind of want to add a "kb_to_floppy_disk" custom filter in Eleventy and represent all file sizes in relation to how many 3.5" floppy disks would be needed. |
It's the subtitles and video pages for about 5.9k youtube videos. (not sure how I've got 1k more transcriptions than I have clips 🤷♂️)
that completes as expected, although I haven't diffed the output to see if there are any changes/bugs etc. |
So, i think you're saying: Doubting this has already been fixed in 1.0.1-canary builds, but if you were looking to try the sharpest of cutting edge builds, you could try npm i @11ty/eleventy@canary. 🔪 npm info @11ty/eleventy dist-tags --json
{
"latest": "1.0.0",
"beta": "1.0.0-beta.10",
"canary": "1.0.1-canary.3"
} |
|
@pdehaan the problematic json source is only 2.7mb gzipped (in case one wanted to produce a bare-minimum reproduceable case), although I suspect one could bulk generate random test data with for an array of objects this structure & it'd do the trick: {
"id": "yt-0pKBBrBp9tM",
"url": "https:\/\/youtu.be\/0pKBBrBp9tM",
"date": "2022-02-15",
"dateTitle": "February 15th, 2022 Livestream",
"title": "State of Dave",
"description": "00:00 Intro\n00:11 Presentation on Update 6\n01:23 Just simmering\n02:04 Recapping last week\n02:24 Hot Potato Save File\n04:53 Outro\n05:26 One more thing!",
"topics": [
"PLbjDnnBIxiEo8RlgfifC8OhLmJl8SgpJE"
],
"other_parts": false,
"is_replaced": false,
"is_duplicate": false,
"has_duplicates": false,
"seealsos": false,
"transcript": [
/*
this is an array of strings that could technically be structured objects but are generally only strings of
single words up to full groups of paragraphs up, with this example having about 5-7kb of strings in total
*/
],
"like_count": 7,
"video_object": {
"@context": "https:\/\/schema.org",
"@type": "VideoObject",
"name": "State of Dave",
"description": "00:00 Intro\n00:11 Presentation on Update 6\n01:23 Just simmering\n02:04 Recapping last week\n02:24 Hot Potato Save File\n04:53 Outro\n05:26 One more thing!",
"thumbnailUrl": "https:\/\/img.youtube.com\/vi\/BBrBp9tM\/hqdefault.jpg",
"contentUrl": "https:\/\/youtu.be\/0pKBBrBp9tM",
"url": [
"https:\/\/youtu.be\/0pKBBrBp9tM",
"https:\/\/archive.satisfactory.video\/transcriptions\/yt-0pKBBrBp9tM\/"
],
"uploadDate": "2022-02-15"
}
} p.s. this is the template that's in use in case it's a combination of size-of-data as well as the template: https://github.com/Satisfactory-Clips-Archive/Media-Search-Archive/blob/d5040ac3a42f8eca9517931812892d493b81d326/11ty/pages/transcriptions.njk, rather than just size-of-data |
@pdehaan working on an isolated test case, have managed to trigger the bug in 0.12, going to check at what point 0.12 succeeds where 1.0 fails. |
@pdehaan isolated test case currently fails on 0.11, 0.12, and 1.0 at about 21980 entries: https://github.com/SignpostMarv/11ty-eleventy-issue-2226 usage: the data & templates aren't as complex as those in the media-search-archive repo, will give a second pass at making this more complex if it's not useful enough to let you experiment with avoiding the heap out of memory issue? |
test.json.gz |
@pdehaan including the markdown repo as a source across all three versions definitely suggests it's either templating or data-related, rather than input-related, as all three versions can handle 7k of just straight-up markdown files. will amend further in the near future and keep you apprised. |
@pdehaan bit of a delay with further investigation; Have started converting the runtime-generated data to pre-baked data, it looks like having the 131k line json data file in-memory causes the problems. |
@pdehaan have updated the test-case repo that fails on 1.0 with 9k entries ( |
I'm hitting this problem as well. I have a site that (only about 1,600 pages) that builds fine with Eleventy 0.12.0, but when I upgraded to 1.0.0 I get out of memory errors. I've got a global data file (JS) that pulls data from a database (about 660 rows of data) and uses pagination to create one page for each entry from the database. If I shut the database off so that those pages don't get built, the build runs fine with 1.0.0. I can work around the issue by increasing Node's max memory thus:
Not sure what happened with 1.0.0 that increased the memory usage this much (with pagination, or global data?) but it'd be great to get it back down. |
/summon @zachleat Thanks @SignpostMarv, I'll try fetching the ZIP file from #2226 (comment) and see if it will build on my laptop locally (disclaimer: it's a higher end M1 MacBook Pro, so results may differ). @esheehan-gsl How complex is your content from your database? (is it Liquid or Markdown? etc) |
There are quite a few fields coming from the database. There's probably over 30 fields coming from the database. Some of it is HTML, some of it is just metadata (paths to video files, categories) that get rendered on the page. If it helps, it's used to build these pages: https://sos.noaa.gov/catalog/datasets/721/ |
@pdehaan to clarify, the zip file isn't needed as the problem is replicable at a lower volume of generated pages (9k + supplemantary data) rather than the zip file's higher volume (21.9k w/ no supplementary data) |
I created https://github.com/pdehaan/11ty-lorem which can generate 20k pages (in ~21s). |
@pdehaan could you now grab the supplementary data file from my test repo (or generate something similar) and see how much lower you have to drop the page count? |
Howdy y’all, there are a few issues to organize here so just to keep things concise I am opening #2360 to coordinate this one. Please follow along there! |
@zachleat tracking updates specific to the test repo here, rather than on new/open tickets: 80000
40000as above, except for:
|
What are the success conditions here? Is 80K the goal? |
@zachleat was basing the test cases from your google spreadsheet, one assumes if it succeeds at 40k it'll succeed at the other sizes you found. p.s. I'm not sure if the 80k "too many open files" thing should be counted as a new issue or a won't-fix? |
success @ 50k + 55k + 59k + 59.9k + 59.92k + 59.925k + 59.928k + 59.929k, too many open files @ 60k + 59.99k + 59.95k + 59.93k A couple things that I'm noticing:
|
Is there any progress here? I also have a bigger JSON source (5.6MB with 270k rows) that made circa 17k pages. On my local setup, I can build it with --max_old_space_size in ~5 minutes, but on Netlify, it breaks with the heap limit. On another topic: do you have any tips on importing this amount without breaking? Is an external database a better idea? Thank you! |
The most terrible option would be to duplicate templates & split the data up. |
Yeah, that is something that came to my mind, too, but it will kill the pagination and the collection as a whole. It would be cool if we could break these files into smaller pieces and source them under the same collection or something similar. For some reason, I could build it on Netlify without the error (maybe it needed time for the NODE_OPTIONS or it had a better day, I am not sure, unfortunately), but still complicated to plan knowing this problem. And my demo is quite plain, almost only data, with the biggest extra is an image inliner (SVG) shortcode for the icons. Thank you for the feedback. I'll update if there's anything worthwhile. |
If you're referring to pagination links, one assumes that if you're taking steps to have data automatically split, you can have pagination values automatically generated "correctly"? |
Breaking the source file beforehand could work for me if I could handle it as one collection at import. Still, much more editorial work to manage but at least no hacking at the template level. For the pagination (to connect two sources): I think you can offset the second source's pagination but still, you have two not related data groups with more administration and hacky solutions. |
I've yet to revisit upgrades on mine since migrating away from the mixed markdown + json sources to eliminate the markdown source 🤔 |
I'm facing a similar issue. I have a 1.9GB JSON file (src/_data/configs.json) containing an array of 591 494 objects. ---
pagination:
data: configs
size: 1
alias: config
permalink: "{{ config.permalink }}"
eleventyComputed: {
title: "{{ config.data.symbol }}"
}
---
Hello {{ config.data.symbol }} Unfortunately, it doesn't generate any HTML output. I reduced the size of the operating system: MacOS ventura, M1 pro, 16GB |
Anyone landing here in 2024: If using WebC:
And/or:
I really didn't want to bump up RAM -- we're still just in the 1,000's of assets range (with relatively chunky objects). Switched to v3 and haven't bumped into RAM issues since. Also much faster:
After also removing the nested
|
Hey @d3v1an7, for me, it is still present, but somehow, Netlify pushes it through (19k pages), although the live output will be a bit buggy. On local, I use a different data set with fewer records. It is good news that v3 could solve it; I plan to migrate in the future. Thanks for the update! |
Confirmed that https://github.com/SignpostMarv/11ty-eleventy-issue-2226 worked up to 160k files on 3.0.0. |
Describe the bug
I've a semi-open site generator project that squishes gigabytes of data sources down to about 6.9k pages to be processed by eleventy in two ways:
0.11.1 handles the 6.9k documents & 15.6mb json file without issue, 1.0.0 falls over in a similar fashion to that described in #695
To Reproduce
The site generator is semi-open in that the source is available at https://github.com/Satisfactory-Clips-Archive/Media-Search-Archive, but it's not feasible to stash 2.7gb+ of source data into the repo, so the repro steps aren't readily reproduceable by anyone that doesn't have the data set.
While the method mentioned in #695 of specifying
--max-old-space-size
does move the goalposts somewhat, it still falls over with 8gb assigned.Steps to reproduce the behaviour:
npm run build
or./node_modules/.bin/elevent --config=./.eleventy.pages.js
Expected behaviour
1.x to handle 6.9k markdown documents or 6.9k json data file entries as reliably as 0.11.x does
Screenshots
Environment:
Additional context
The text was updated successfully, but these errors were encountered: