Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting node in to more projects #9

Closed
mikeal opened this issue Nov 28, 2014 · 19 comments
Closed

Splitting node in to more projects #9

mikeal opened this issue Nov 28, 2014 · 19 comments

Comments

@mikeal
Copy link
Contributor

mikeal commented Nov 28, 2014

I'm going to coalesce as much as I can remember of dozens of conversations I've had with people over the last year in to an actual proposal.

For some time node has been getting separated in to difference projects which then become dependencies of node itself in some way. These range from libuv to readable-stream (which is just stream in core).

How and when a module gets broken out in to its own project has been seemingly random, although a more accurate description might be "whenever the opportunity arrived," like a module rewrite.

A specific "version of node" must be a static collection of these dependencies, as has been the case for the history of node, suggesting that we change this would be crazy. Below I will describe this proposal as a collection of projects (some existing and some yet to be produced) which would be locked to a specific version of each project in order to produce a single version of node.

  • v8
  • libuv
  • http-parser
  • libuv.js -- libuv bindings to v8
  • require -- the module system
  • stdlib.js -- every JS module you can require() without installing as a dependency
    • within stdlib.js a specific released version would also have, as dependencies, many other modules like readable-stream
  • npm
  • node-bin -- node's command line interface

Note that there is no abstract C/C++ interface on top of v8's API. Native modules would still be built using a combination of nan (which has the advantage of being able to stretch multiple v8/node version combinations) and direct v8 bindings, as is the case today.

The biggest benefits I see with this approach are reuse and ease of contributions. I think we would see some very interesting experiments with libuv.js. I also think that the community in userland that has had a hard time getting input in to node core would have an easier time directing that feedback towards a stdlib.js or require project which is entirely JS.

Concerns with this approach

  • Additional coordination cost between dependencies.
    • How well is this handled today with libuv?
  • Additional administrative overhead, maybe.
  • Increased difficulty of making performance related improvements traversing several dependencies.
@rvagg
Copy link
Member

rvagg commented Nov 28, 2014

/cc @defunctzombie who has already been experimenting with libuv.js

@darrenderidder
Copy link

@mikeal I like the proposal of having node core modules separated out with an interface you could call from any js runtime. You mentioned libuv bindings to v8 - how about a generic JS interface to a thread pool?

@defunctzombie
Copy link
Contributor

Some notes for those interested

In experimenting with libuv.js a few things became clear to me:

  • libuv.js would be as un-opinionated as possible. The goal would be to simply have libuv exposed as JS features in the most optimized way.
  • End users would unlikely use libuv.js directly. There would be "distributions" that package common and useful components together. The org making libuv.js could even provide what they think is a barebones useful distro.
  • The reason you want some sort of distro is for modules while have little contention (crypto is really the only one that comes to mind). While in an ideal world it may be fun to have a module for each crypto thing, in practical c/c++ terms, building native stuff is hard for most users. This is obviously a dangerous slope because there is no end to what you could decide to include.
  • I see the only true core components as those that are tied to the event loop in some special way. At the start, these were sockets, child process, as similar.
  • Streams are NOT a core component. Streams are an opinion about how to consume data. Yes, it has shown to be useful for us all to adopt the same stream interface, but it has also shown its pains. Streams are a distro or package level concern.
  • HTTP is NOT a core component. It is a protocol build on top of a TCP socket.
  • nan is cool and should be leveraged at the distro level.

tl;dr libuv.js is for distribution makers. Anything that isn't tied to libuv is an opinion and should be a module. A distribution is a set of predefined modules. It should be easy to make a distribution. Modules are amazing because they are versioned and easier to upgrade piece-by-piece. Underlying software stacks are not.

@indutny
Copy link
Member

indutny commented Nov 28, 2014

Just a 5 cents from me: current TLS implementation in v0.12 does depend on being able to use libuv from C++. libuv.js could be used, but it'll need to provide an External handle and a C++ API for tls_wrap.cc to work.

@mikeal
Copy link
Contributor Author

mikeal commented Nov 29, 2014

@indutny perhaps we should add a "crypto.js" or "tls.js" to the list of top level projects?

@indutny
Copy link
Member

indutny commented Nov 29, 2014

@mikeal I don't mind, but it seems to be unrelated to my previous comment ;) I was just trying to say that these modules should expose C++ APIs as well as the JS stuff, and this could be done with External objects.

@bmeck
Copy link
Member

bmeck commented Nov 30, 2014

been using bmeck/node-module-system to do customized require shims for a long time now

@devongovett
Copy link
Contributor

👍 IMHO all the core modules should just be in npm instead of built in. Node itself could just be a require system for loading modules from npm and binding with libuv and v8. Not sure how that would work, but it would be awesome to be able to update the core modules independently from node itself and from the other core modules (i.e. more frequently).

@brianleroux
Copy link

This is interesting. We just started binding JSC to libuv for kicks doing our first pass at a ServiceWorker impl for Cordova/iOS.

Would fit nicely with the idea of libuv.js / happy to bring that work over here.

@Fishrock123
Copy link
Contributor

👍

@defunctzombie
Copy link
Contributor

Happy to contribute the libuv.js code to this org (or whatever org we need) or y'all can start fresh if that is easier :)

@mikeal
Copy link
Contributor Author

mikeal commented Dec 2, 2014

@defunctzombie can you link us to the repo :)

@defunctzombie
Copy link
Contributor

@chrisdickinson
Copy link
Contributor

I have a few feels to share about the following sentiment:

IMHO all the core modules should just be in npm instead of built in. Node
itself could just be a require system for loading modules from npm and
binding with libuv and v8.

For a long time, I really enamored of the no.js approach, but
now I'm not so sure -- or at least, I don't think it will be as minimal as folks expect.

The following things have since shaped my opinion greatly about this approach, and what the
responsibility of a distribution (like Node, or IO.js) is:

  1. VoxelJS moved away from using THREE.js Vector primitives towards three-element arrays.
  2. Trying to add a new feature to Node streams.
  3. Cavorting through the various core modules.

voxeljs

VoxelJS has a peer dependency problem. The immediate problem was the plugin structure: plugins accepted an entire engine instance as a parameter, creating a peer dependency on a specific version of the voxel-engine package. Sometimes this was explicit, other times not. The end result was that some voxel plugins were incompatible with each other. This hints at a theme: packages should only rely on shared, globally available primitives to communicate.

The THREE problem was more insidious than the engine-as-parameter problem. Plugins (and even core components of voxel) would pull in THREE.js themselves. They depended on THREE for vector and matrix primitives. Often, these versions of THREE were incompatible, and would result in multiple copies of THREE manifesting in node_modules at different depths. While this would balloon out compilation times, the larger, harder to diagnose problem was that these vector and matrix instances would and could be passed around from package to package. This led to a situation where well-behaved plugins spent much of their time marshalling and unmarshalling inputs and outputs to and from object literals of {x, y, z} -- creating a bunch of garbage in the process, which is never good for something running at 60hz -- and poorly behaved plugins would send their vector and matrix instances far afield.

When THREE changed the signature for vector operators some plugins broke because vector.add(rhs) -> new Vector was now vector.add(a) -> null. Package A would get a vector of version 1 from package B, and try using it like a version 2 vector. This would cause hard to track problems -- the source of the problem could be packages away and potentially several operations distant from the exception. Well-behaved plugins, while slow, didn't exhibit this behavior because they communicated solely using globally available primitives to communicate with other packages.

Eventually, voxel moved to a three-element array-based vector system, where each package could bring in a copy of glmatrix -- data was communicated in a globally-available primitive, while operations were stored locally to the module. This was also (IIRC) the birth of ndarray -- separating operations from a globally-available backing store primitive. This approach to decoupling packages works, but only provided the operations can be extracted from the data and localized -- see the current state of Promise-based packages in the npm ecosystem for an example: the best practice is to present a callback-based API. This is not just to cater to a larger audience, I suspect -- it also effectively shields those package authors from promise implementation interop problems. Also note: Promises are extensively specified and are now global primitives to address this very issue.

adding pipeline errors to streams

As a result of a conversation with @phated, it came to my attention that gulp was looking for a way to "handle" errors as part of a stream pipeline. We sketched out a plan for something called "pipeline errors" -- which I then implemented.

The feature itself (though stalled) is not the primary concern here: what I want to call attention to is the
difficulty of changing the stream API. Not only does one have to make sure that one's changes are backward compatible with the stream.Stream primitive packaged with the Node distribution, one also has to make sure it works with all previous versions of Node streams -- due to the prevalence of the readable-stream package. This would be okay for a spec'd, "finished" primitive; but streams are not finished -- there's a lot of room for improvement between now and v1.0 of IO.js, evinced by the various pain points: through2 is still prevelant, suggesting a need for a more ergonomic API, the object mode distinction causes problems for developers (especially with browserify), standardization of "resource-backed" streams events ("close", "abort", et al) should happen ...

It's hard to make improvements on primitives that are loosed into the userland module ecosystem. If a package pegs their version of streams, and exports a stream instance that's used by another package that's using platform streams, there's no telling what bugs might arise when streams change in the future. Again, packages in the module ecosystem can only meaningfully communicate using globally available primitives -- and I would contend that it's the distribution's job to choose and tend to those primitives, because the module system precludes packages from usefully selecting these abstractions.

core modules

Finally, what really moved me away from the no.js perspective was spending time with the core subsystems in Node. A lot of the "split out modules" discussion takes for granted that core modules are separate items that can be consumed piecemeal by userland.

However, because the distribution defines global primitives, and those primitives are implemented in terms of each other, the core subsystems are deeply intertwined -- for the most part, they can't be consumed piecemeal.

Streams are built on event emitters which use domains which are tied into the C++-level wrappers which themselves expose handles for streams to use. All of these use the concept of timers which are tied into domains and async listeners. The module system is built off of FS which is built off of the C++ fs wrapper, which is tied into domains / async-listener, which themselves are implemented as event emitters. There's potential to change some of this, yes -- or otherwise bootstrap it -- but I think the dissolution of core modules will look vastly different from how it's been proposed thus far, and the notion of using the module ecosystem to move the distribution forward is using a hammer where a saw is needed.

Any distribution will need to pick shareable primitives for at least the following operations to expose libuv to JS:

  1. A cross-package binary data primitive.
  2. A cross-package primitive representing a single operation happening once, pass or fail.
  3. A cross-package primitive representing a series of events happening over time, which may fail or complete once.

I suspect that by picking those, in addition to exposing the libuv APIs, one will end up with something that looks very much like Node. That said, there's definitely value in keeping core slim. It would even be interesting to reimplement the above using the primitives WHATWG has developed to address the above needs (typed arrays, promises, and whatwg/streams)!

Distributions are there to bootstrap the module system, and to do the things that the module system cannot. Picking cross-package primitives is the responsibility of the distribution -- I don't think streams, buffers, or event emitters can be pushed entirely into userland in a useful way.

An aside:

I also think that the community in userland that has had a hard time getting
input in to node core would have an easier time directing that feedback
towards a stdlib.js or require project which is entirely JS.

I agree that there's a problem -- but I don't think it's the presence of C++ that's driving folks away from steering the project. Adopting something like the Rust RFC process, and ensuring that all technical decisions -- including ones from the TC -- go through that process would go a long way towards getting folks involved.

@indutny
Copy link
Member

indutny commented Dec 3, 2014

Another note: this will probably affect distribution much for many people. I know that Voxer, for example, is rolling out just a node binary to some of it's servers to deploy stuff.

@creationix
Copy link
Contributor

I love this discussion. As I mentioned in #28 (probably off topic in that thread), I've been experimenting with this idea for some time and have successfully implemented it in luvit.

I suggest to split the project into 3 layers.

  • libuv.js is simply libuv bindings for V8. This adds no opinion on top and only exposes the C API in a manner that makes sense for V8. I've done this several times in various scripting engines with my most complete and most successful being https://github.com/luvit/luv
  • unnamed middle layer: More on this later.
  • node/iojs 2.0: This contains libuv.js as a static bundled dependency. It also contains other C++ bindings such as zlib, openssl, c-ares, etc. On top of that it bundles in some opinions about streams, event emitters, domains, a few core protocols like HTTP, and whatever is decided should core in the core distribution.

There is a huge gap between what node currently offers and what libuv.js should offer. Writing libuv bindings for a new scripting engine is 70% figuring out how to best expose libuv in the semantics of the scripting language. The other 30% is pure grunt work applying what you decided to the full libuv feature set function by function. The remaining 90% is repeating steps 1 and 2 a few times and throwing away code till you're happy with it. Completely rewriting luv took me all of two weeks of sprinting. It helped that I knew what I was doing, already knew the libuv and lua APIs, but still, it wasn't that much work.

Rewriting luvit, on the other hand, has been full-time work for a few months for me and others and we're still not done.

The middle layer (luvi in my project) helps tremendously by separating concerns. Luvi is in charge of bundling the various C bindings and statically linking them into the scripting engine and producing a single binary result. This binary has no require system and very little in the way of standard library.

It does have one killer feature though. It can detect if a zip file is appended to it and can treat that embedded zip file as a virtual filesystem. It will automatically look in this embedded zip for main.lua and bootstrap the process there. It has no command-line argument processing of it's own and delegates that all to the embedded zip. It's controlled exclusively via two environment variables. One will run a folder as if it was an embedded zip (for development) and another will create a new copy of itself with the zip embedded. Read the luvi README for full details. https://github.com/luvit/luvi

This splitting does wonders for improving workflow when working on luvit core and eases contribution. You don't even need a C compiler installed on your system to built fresh new luvit binaries. One of my collaborators worked on luvit during thanksgiving while at his relatives house using the windows laptop in their house. He didn't need to install visual studio or anything, just install git and start working.

@defunctzombie
Copy link
Contributor

Is anyone interested in any of these approaches or should this issue be closed? I think it is wise to make a decision here or try experimentation or close the issue. Lingering issues have a way of just going nowhere.

@creationix
Copy link
Contributor

I'm afraid I don't have a whole lot of time to do actual coding in node with this, but I am creating a new JS platform from scratch that will use the same technique I did for luvit.

If node or iojs were to adopt my suggested approach it would mean basically rewriting the thing from scratch porting modules over one at a time to libuv.js.

If libuv.js had a similar JS interface to my duktape bindings (and it should) then much of the higher-level sugar code could be reused between the projects. If done right, we could even have swappable js engines. I wonder if @brianleroux has time to get his JSC bindings to the same state or someone at mozilla wants to take another stab at my luvmonkey effort.

I will continue my duktape project regardless. I'm only here to share my experience and opinion and offer assistance if someone decided to go this route.

@mikeal
Copy link
Contributor Author

mikeal commented Dec 4, 2014

In short, everyone likes the idea in theory but there are a variety of concerns with what the actual implementation might look like. I'm going to close the issue in favor of a future PR or new repo with real code :)

lemire added a commit to lemire/node that referenced this issue Jan 3, 2024
1. To avoid many warnings, this PR declares the C and C++ standards separately.
2. This PR extends gyp so that we can build with AVX-512. Nevertheless, getting runtime dispatching with ClangCl through Visual Studio is challenging, so we disable it. It only affects base64 and one component of zip, so the effect on runtime performance should be negligible. Note that other dependencies such as simdutf do not need to this build support for runtime dispatching (so you still get AVX2, AVX-512 support in these dependencies).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests