-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Solving the waterfall problem with depcache #209
Comments
as far as I can tell this sounds like quite a good idea 🤗 there is a small typo in the example - but it got me confused for a sec 🙈 "/main.js": ["dep"],
// should be
"/main.js": ["dep1"], |
We used a technique like this at Google, and have recently moved away from it. If you visit https://store.google.com/us/, view source, and search for I used this store as an example here because it's an old enough site that it's using the old technique. We moved to a system that requires a bit more server-side logic, where (following your initial example) the client makes a request like "I want lazy-feature, and I have already downloaded main and dep1". The server has a module map and can use that to send both lazy-feature and the dependency dep2 to the client in one response. Critically, note that in this exchange we never needed to send to the client any info about dep2 up front -- the client tells the server what it knows, and the server holds the module map. A dumber client requires less up-front data and logic, both of which help page load time. I don't say this to kill your idea. I agree that the waterfall problem for fetching is a real problem, and your solution is similar to one that worked at Google for a long time. I just thought you might find the additional info interesting, and maybe it can help spark some more ideas. |
Over time as the user downloads more modules, we do end up downloading the module graph at some point once we exceed url limit for modules downloaded |
Interesting! I'd like to clarify a couple of points:
|
@evmar thanks for the interesting feedback!
I understand internal Google code development workflows are of course a little more intricate, but I think it is important to point out how module merging optimizations, now that they are well defined and well established, change the calculation. The production module graph should have much fewer module nodes (at least an order of magnitude) than the development module graph, effectively the natural typical module combinations formed under your got-want-need scheme below played out across all clients.
I do think it will be worth supporting lazy loading of import maps at some point. If we split up the page load into:
An optimization if lazy loading of import maps is supported would be to separate import map (1) from import map(s) for (2), such that (1) is the initial import map, when after the page load, a new import map for (2) (or a few different ones) get lazily loaded in line with their priorities. This way, the initial page load is not slowed down by the growth of the mappings for lazy loads on the page. This is starting to get out of scope for this thread, but handling how
One of the major benefits of depcache is that it can work for static server use cases. It would be a shame to have to tell users they can only get optimized delivery of their apps by using very specific server software, that must then have full knowledge of the module graph, and tying server software internals to application delivery. Server logic only really comes into its own for advanced cases of (2) and (3), and even then such work could even build on lazy import map loading too as described above. @iteriani can you clarify what you mean by the url limit here:
Are you referring to reaching the limits of the module map size itself? |
Your points are largely the main ones I think. I could still get behind either approach, and this has been my opinion for a while. But this depcache proposal was posted out of my frustrations in getting nowhere after many continued preloading discussions with many different people. Then my recent realization was that solving the waterfall problem for all web assets is simply not a primary concern for the web today due to most web assets having a limited tree depth, which is not a large enough factor in load time perception like it is for module graphs. @evmar's first comment was also about how a feature like depcache can itself bloat the page load. It would be a shame if the preload manifest were to become so full-featured and verbose so as to slow down page loads. Trying to flesh it out in #208 also gives some idea of the verbosity to expect. What's nice about depcache is it is very much just solving the direct problem without trying to walk down the path of a "manifest to rule all manifests", which a preload spec risks, which seems like it could get lost in the weeds as well.
This is a difficult one indeed and an important question. I really don't know how to set those criteria other than on a very careful case by case basis. I suppose the general guide should be it could extend but only to features that relate only to modules. Under this logic, module attributes could be a possibility but integrity, fetch options and credentials would not be suitable for import maps, and whether import maps could include execution / capability / module security options seems unclear. |
Once the length of the url for requesting modules (in addition sending up the modules already loaded) exceeds 2048, we download the whole module map. |
I do think it's worth having both solutions (normal module loading vs negative module loading) for a wide variety of reasons. It's just interesting noting that high requirement customers will need a little bit mor than the current proposed solution. At Google we use a JavaScript framework that basically late loads event handlers, so having hundreds of even thousands of modules even after graph optimizations is possible. I'm not sure how these two pieces of information will be figured out at request time: The modules that load on page load. Does this involve some sort of server-side rendering or statically built map? |
You might find this useful: https://github.com/azukaru/progressive-fetching/blob/master/docs/dynamic-bundling/index.md It explains why large Google applications end up with thousands of modules even for production bundles. It does not mean thousands of requests, the client framework has the ability to pick up the subset of modules and load them together. As @iteriani said, the design principle is to "load the minimal amount of code that are needed", and this means we need to load different combinations based on different user interactions. If you have thousands of production modules, then the mapping soon becomes a burden (>60KB for some applications). Which is why we move to a different system to avoid mapping. |
I think there's three different scenarios involved here:
Each come with their own assumption about how much implementation effort adoption can require and how many other in-progress specs/unknowns are involved. I don't think there's a practical proposal yet how to incrementally load a module graph using web packages. |
@iteriani @fenghaolw on a brief read, if I'm understanding correctly, this sounds like these modules are effectively AMD-like wrappers on the development modules. It's important that production modules are optimized based on module merging, such that the graph is much smaller, rather than individually wrapping each dev module with a function wrapper into the output chunk.
The same process that constructs the import map needs to know which dependencies to include in the import map. There is no reason to include dependencies which aren't loaded in the import map, Thus the optimal import map is one that is traced to just be what is necessary for the page load. This same tracing is the tracing needed to populate the depcache. It could be done dynamically or statically too. |
It's complicated™ but generally they aren't AMD-like wrappers. It's doing true module merging. The difference is that it's doing it globally across the entire module graph (renaming identifiers to remove conflicts etc.) and then dynamically returns fragments of the entire "maximally merged module" on load. Each of the fragments is roughly equivalent to a merged module in a more conservative module merging approach (e.g. rollup). Paraphrasing a crafted example URL: /*_M:FirstFragment*/
debug_track_module_execution_started('FirstFragment');
/* raw module body here with identifiers adjusted to remove conflicts */
debug_track_module_execution_done();
/*_M:SecondFragment*/
debug_track_module_execution_started('SecondFragment');
/* raw module body here with identifiers adjusted to remove conflicts */
debug_track_module_execution_done(); Worth nothing that this mostly works so nicely because it's compiling to a script in the end. That way it doesn't have to worry about re-linking across the concatenated fragments over time (it's all simply globals). That's why I said "there's some fundamental incompatibilities with how modules work" in this approach. |
Why paths used not keys/names? Paths may be very long May be this variant?
|
like preload? |
Let's keep this discussion in https://github.com/guybedford/import-maps-extensions. |
I'd like to propose a new
"depcache"
field in the import map, as an optimal solution to the waterfall problem for production optimization.The core idea is to provide the ability for tools that generate import maps to also generate the metadata of the module graph as a sort of graph cache (dependency cache) at the same time - a map from URLs to their list of dependency specifiers.
With a populated depcache, as soon as a module is fetched, the depcache can be consulted, and the corresponding cached list of module dependencies preloaded in parallel immediately.
For example:
Say the main app loads
"main"
on startup. This is resolved to/main.js
, which the depcache allows us to see that we need to also load/lib/dep1.js
immediately. This trace applies recursively to all requests as well. These known dependency requests are then made in parallel with the main request, possibly from the cache, the app thus loading with zero latency waterfalls applying.Later on a dynamic
import('lazy-feature')
is executed. At this point, the depcache can again be consulted, to see that/lazy-feature.js
will import bothdep1
anddep2
. Since/lib/dep1.js
is already in the module map we do not need to send a new request, so we then immediately send out just two requests - one for/lazy-feature.js
and one for/lib/dep2.js
, again getting lazy loading with full immediate parallel requests, without any duplication and supporting far-future caching. No matter how deep the dependency tree, there is never a waterfall so long as there is a depcache entry.Note that the unresolved dependency specifier is included in the depcache array. This allows the import map to remain the full source of truth for dependency resolution, and the depcache for eg a cached module doesn't go stale despite resolutions changing.
The alternative as mentioned in the readme of this spec is a more general preloading spec. I have discussed with various spec authors and implementors the more general preload spec over the past year or two, and have found there to be a lack of interest - mostly I think because for most web resources, 2-3 round trips is the maximum due to the nature of HTML / CSS development. I would argue that modules are actually unique in having the problem of potentially N levels deep dependency trees, where N can be over 10, thus the latency problem between these depths is truly unique to module graphs.
This depcache proposal seems to me the simplest path forward right now to solve this latency waterfall problem in a fully optimal way, given that a more general preload manifest is not getting traction, but I'm hoping by posting both this proposal and #208, we can drive these conversations forward to ensure that this production optimization solution is tackled, as it really needs to be now for modules.
I'd like to also be able to move forward with this or a similar proposal in both SystemJS and ES Module Shims. Depcache as specified here has worked very well in previous versions of SystemJS for many years, and I'd like to start shipping this feature or similar in both projects again soon as it is a critical performance necessity right now for users of these projects today. Both projects aim to track these standards so hopefully we can continue these discussions and continue to solve these problems optimally.
The text was updated successfully, but these errors were encountered: