Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: ideas to improve startup time #15945

Closed
4 of 10 tasks
littledivy opened this issue Sep 18, 2022 · 10 comments
Closed
4 of 10 tasks

perf: ideas to improve startup time #15945

littledivy opened this issue Sep 18, 2022 · 10 comments
Labels
cli related to cli/ dir perf performance related

Comments

@littledivy
Copy link
Member

littledivy commented Sep 18, 2022

Results (after applying below optimizations)

1.5x improvement

image

runjs is a barebones rusty_v8 CLI for baseline.
target/release/deno is Deno from https://github.com/littledivy/deno/tree/startup_time
deno is Deno 1.25.3
node is Node.js 18.8.0

Potential optimizations

Deno main:

image

macOS:

  • dyld (major)

Dependency on CoreGraphics & Metal slows down dyld. These come from webgpu, ideally we should lazy load them.

Without webgpu:

../deno/target/release/deno:
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1858.112.0)
	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 1141.1.0)
	/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60158.100.133)
	/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
	/usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0)

With webgpu:

/Users/divy/.deno/bin/deno:
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1853.0.0)
	/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 1141.1.0)
	/System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore (compatibility version 1.2.0, current version 1.11.0)
	/System/Library/Frameworks/Metal.framework/Versions/A/Metal (compatibility version 1.0.0, current version 257.24.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
	/System/Library/Frameworks/CoreGraphics.framework/Versions/A/CoreGraphics (compatibility version 64.0.0, current version 1557.1.1)
	/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60157.30.13)
	/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
	/usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0)

Additionally we depend on Security.framework for hyper-tls integration. This could also be lazy loaded in some way.

After applying all of that:
image

There are maybe some optimizations that can be made in deno_graph::crate_graph for the main module.

Rest of the time is spent in:

  • MainWorker::execute_script
  • v8::Context::FromSnapshot
  • v8::Isolate::InitWithSnapshot

See https://github.com/littledivy/deno/tree/startup_time

@littledivy littledivy added perf performance related cli related to cli/ dir labels Sep 18, 2022
@bnoordhuis
Copy link
Contributor

Maybe it's possible to get rid of CoreFoundation and CoreServices as well? They're pretty big frameworks that in turn pull in a bunch of additional frameworks like libobjc, ICU, etc.

In case anyone's wondering, libiconv is pulled in by the libc crate and only on apple platforms.

@dsherret
Copy link
Member

dsherret commented Sep 19, 2022

ParsedSourceCache initialization (sqlite connection to cache store) is expensive. (major)

This is the ParsedSourceCacheModuleAnalyzer initialization and I believe it occurs only where necessary (edit: actually, maybe sometimes it migth load it multiple times per execution. Perhaps it could skip some steps on initializations after the first?). The whole cache was a big performance improvement to startup time (ex. 1s to 0.2s in some cases).

I had previously tried storing the cache information in a single file, but found that slow... that said, I was reusing some of our existing file system caching code to do this and that created directory structures (so it was create_dir_all-ing all the time). Maybe just hashing the specifier and using that for the file name, then storing it in a single flat directory might be faster?

Or perhaps something could be done to make the sqlite initialization faster? Perhaps something with the sqlite pragma. I'm not sure.

@billywhizz
Copy link
Contributor

@Divy @dsherret i think we can change the SQLite PRAGMAS as the defaults are not optimal. e.g. we should defo be able to use LOCKING_MODE=EXCLUSIVE if only a single process ever accessing the db. I will take a look and see if we can can find an improvement.

@billywhizz
Copy link
Contributor

@Divy. i see ~23ms startup time on linux. also on gitpod is around 30ms running more modern linux and in some kind of VM. I am writing up benchmark stuff at moment so i will see if i can find any other improvements we could make here on linux side.

@billywhizz
Copy link
Contributor

billywhizz commented Sep 22, 2022

Re. "Move internal JS off of V8 heap. Replace v8::String::new with v8::String::new_external_onebyte_static"

Just flagging that this may cause issues if we have any unicode characters in our internal modules. We should write a test for it if/when we implement to be sure.
https://262.ecma-international.org/6.0/#sec-ecmascript-language-types-string-type

@billywhizz
Copy link
Contributor

billywhizz commented Sep 23, 2022

On my system (a consumer laptop - core i5), the minimal JS runtime based on deno-core boostraps in ~8ms and latest deno release in ~24ms. both running in a privileged container as root to reduce system overhead.

Deno

deno

Runjs

runjs

Vast majority of system time seems to be GC/Heap based so I would think any reduction in amount of code loaded/memory consumed at startup would reduce that a little, but not significantly. Are there modules we could lazy load on demand or does snapshotting preclude that?

Deno

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.10    0.084104         321       262        41 futex
  0.49    0.000421          17        25           read
  0.38    0.000323           3       128           mprotect
  0.21    0.000184           4        42           mmap

Runjs

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.00    0.040965         288       142        14 futex
  1.66    0.000724          48        15           munmap
  1.11    0.000485          69         7           prctl
  0.97    0.000421          18        23           read
  0.70    0.000303           5        64           mprotect

I've also attached two flamegraphs captured with highest resolution i could (41Khz) for both. I will take a look in more detail at this and the SQLite stuff over weekend.

Deno

start

Runjs

start

@littledivy
Copy link
Member Author

Looks like something slowed down startup time in main. It is around 20ms for me now. Gotta fix that :/

@billywhizz
Copy link
Contributor

Looks like something slowed down startup time in main. It is around 20ms for me now. Gotta fix that :/

yup. getting an accurate startup time is not an easy thing to do! i think if we are in ballpark of 10-15ms for a runtime with as much in the box as Deno that's really good. in the real world it will be very rare to be running a single script with no imports etc. and in a VM/container like scenario there are other tricks you can do to reduce this overhead.

@billywhizz
Copy link
Contributor

There's probably a bigger win here in looking at some optimizations we can do around SQLite and any edges we can shave off in the JS module caching - we should do a benchmark with a reasonably large complex codebase with lots of imports. I'll see if i can set something up. If anyone has any suggestions for a good project to use let me know.

@littledivy
Copy link
Member Author

Closing since most of these optimizations were applied and we are now bottlenecked by snapshot deserialization and bootstrap JS. More work to do on that front before startup time can be optimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli related to cli/ dir perf performance related
Projects
None yet
Development

No branches or pull requests

4 participants