Node REPL in Chrome vs CMD. The website is not just a front-end connected to some remote Node.js process (like most online REPLs such as repl.it, runkit.com or tutorialspoint.com). An actual Node.js API instance is bootstrapped in the browser, using xtermjs.org for UI.

To the Browser!

How to run Node.js (apps) in the browser?

Bootstrapping Node.js in the browser — no browserify required

Johannes Bader
Published in
9 min readNov 22, 2017

--

There are several reasons why a JavaScript developer may want to run Node.js code in the browser, including:

  • developed a Node.js app — now want to also offer it online
  • found a useful npm package that relies on Node.js
  • want to use Node.js APIs (buffer, crypto, fork, events, streams, …)
  • prefer to write CommonJS-style modules with require

Existing Options

One will a number of solutions fulfilling at least some of the above wishes, such as webpack.js.org, browserify.org or nodular.js. However, all of those approaches are based on some form of the source code transformation — either ahead of time (webpack and browserify) or at runtime (nodular). At the end of the day, those solutions may be absolutely great and sufficient for your purposes, but really they just try to imitate or emulate certain behavior. It’s quite easy to create Node.js programs that will not behave the same way in the browser:

Browserify tries resolving and bundling referenced modules ahead of time by looking at require calls; above, that will fail due to the obscure usage of require: lib.js will not be bundled, resulting in a runtime error

Nodular interrupts script execution (throws exception) when it hits a require, loads the required module asynchronously and tries to “resume” the interrupted script by performing magic on it’s source code. That doesn’t work if the require happens in control structures or is part of complex expressions.

Webpack is arguably the most sophisticated product, it does a great job at CommonJS module treatment and a fairly good job at imitating Node.js APIs. Despite transforming your sources AOT, it allows debugging a running app based on your sources thanks to source maps. Ultimately, however, subtle differences in the API (e.g. exact timing of process.nextTick vs setImmediate vs setTimeout) can easily expose the fundamental difference.

My Approach

As explained later, I bootstrap the JavaScript part of Node.js inside of a web worker. That’ll provide all implementation details for free! I haven’t written a single line of code of the REPL shown in the title image of this post. No clue how objects are displayed (max. level of displayed nesting, colors, stuff like [Circular]), no clue where the code is that makesprocess.nextTick fire before setImmediate. I bootstrap Node.js and plug it against a xtermjs.org control.

Motivation

The Node.js APIs are in motion, and while there are polyfills for virtually anything out there, I like the idea of just using the original APIs directly. The properties and subtleties of Node.js should be available to scripts inherently. Furthermore I want to enable running Node.js scripts directly and unmodified, no preprocessing required. This enables REPLs and online IDEs that don’t rely on a server for the heavy lifting.

a small detail of Node.js and its event queue — behaving correctly in my port inherently

Apart from that, something inside of me just wants to see convergence of those two JavaScript worlds (Node.js and browsers). Code transformation feels more like a workaround than a solution to me. Imagine 32-bit apps required AOT-transpilation in order to run on a 64-bit machine — instead of actual hardware and OS support.

Challenge: CommonJS Semantics

One of the biggest challenges is of course dealing with CommonJS modules and require correctly. Support for the synchronous nature of require is achieved by webpack and browserift via bundling (effectively preloading) of all scripts that could possibly be required at runtime. Nodular tries to fake the semantics of a blocking call.

Node.js implements require via synchronous file system operations, so let’s zoom out for a second and think about options:

  1. The browser has no direct access to the host computer’s FS anyways (and that’s okay), so that’s not a concern.
  2. One could create an in-memory FS (using variables or local storage), that’ll obviously support synchronous access. Well, that’s essentially what existing solutions do by bundling files ahead of time.
  3. One could treat network resources as an FS. “Regular” URLs are like read-only files, while cloud storage (GitHub, OneDrive, Google Drive, iCloud, …) could act as file systems with write access. But web requests are always asynchronous, right? Nope! To be clear, I’m not suggesting to use (now deprecated) functions on the main thread, but on workers they are allowed and fine — we should bootstrap Node.js in a worker anyways, but more about that later.
  4. Any combination of 2 and 3. For example, just like in actual PCs, writes could first hit some in-memory cache (so they are super-fast) and then be persisted in the background.

So when running in a worker, it seems like require can work without any bundling! 🎉 For my prototype, I simply direct all writes to memory and handle reads by first looking up the path in memory and making an HTTP request on failure (against a URL computed from the path). Very basic but sufficient so far.

Now, how to actually bootstrap Node.js?

In the following, I give an overview of my approach and some technical details.

Architecture

In an extremely oversimplified sense, Node.js looks as follows:

the bindings are exposed to the library through the process object

The standard library, written in JavaScript, provides the high-level Node.js API that you are used to. This API is implemented on top of native, low-level bindings that provide access to OS features like the filesystem or processes. The bindings are communicated to the standard library through a single object: process. In other words, process is the only thing in a Node.js process that originates from the non-JavaScript side — more about the specifics later. V8 is the JavaScript engine used to execute any JavaScript code, i.e. standard library and all scripts.

Strategy

In order to run in the browser entirely, the C/C++ parts have to be replaced by JavaScript implementations:

we’ll have to provide bindings that work in the browser

Specifically, the V8 part of things is essentially free, since browsers execute JavaScript out of the box (in case of Chrome, using V8 😏). Replacing the bindings means actual work on our side.

I was glad to see that a vast amount of implementation details are indeed part of the standard library and not the bindings — the bindings are essentially as low-level as possible while still being platform independent.

Imitating the Startup

As mentioned, the bindings are provided through object process. The standard library has a bootstrapping function which takes process as an argument and will prepare a large number of APIs. Specifically, there is a function process.binding which takes a name like 'constants', 'buffer', 'fs', 'os', 'tcp_wrap' or 'constants' and returns an object with native functions — this mechanism is comparable to require returning modules, but on a lower level.

I observed that calling native functions provided by process.binding the wrong way often results in the process crashing, e.g. process.binding('os').getCPUs(). I assume there is no arity and type checking performed. This is of course not an issue since the standard library calls these functions correctly and does not expose them to the programmer directly.

For my port I actually started out by passing process = {} to the bootstrapping function, having no clue how things work or what is expected. I let myself guide by the errors thrown by the standard library. One of the first errors was due to function binding missing. I implemented binding as throwing an error whenever it is called with a name I haven’t seen before. In case of such error, I would merely add code to return the empty object for that name. If something is missing on that object, an error will be thrown, and so on. The benefit of this lazy approach is that the error locations will usually indicate exactly what you have to implement next. No previous investigation or context required. If unsure about a function implementation, implement it as debugger; and wait until the debugger breaks on it. Inspection of the arguments will likely give you a clue. Some functions however modify global state (e.g. process.binding(‘fs’).stat), in which case I checked the C sources to get an idea of what’s supposed to happen.

At some point, calling the bootstrapping function will not throw an error, hooray!

What to call next?

Nothing, you’re done! The bootstrapper takes care of whatever is supposed to happen after bootstrapping as well. If process.argv contains a script as argument, that script will be invoked. If it doesn’t, the REPL starts. Slick.

Next Steps

There are more things to do, but it’s all downhill from here! Suggestions:

  • Connect process.binding('tty_wrap').TTY to something better than some <textarea>. I chose xtermjs.org.
  • Bootstrapping may have succeeded, but 99% of the bindings are still missing… as soon as you try to load a script or use the REPL for stuff that goes beyond "Hello World".toUpperCase(), errors will start raining again. But we already know how to treat those. Filling in the gaps.

Web Worker Goodness

Recall that we’re running everything in a worker in order to get synchronous HTTP requests. This is actually a great concept anyways for several reasons:

  1. Isolation: It somewhat makes sense that a bootstrapped Node.js instance does not run in the same context/thread as the main process (which could take the role of an OS). Interaction with the UI can be handled via message passing, which is a core Node.js concept anyways.
  2. Isolation: If multiple Node.js instances are supposed to be runnable at the same time, the global scope has to somehow be separated, an infinite loop in one instance shouldn’t affect the other, and so on.
  3. Isolation: Workers plays nicely with the child_process APIs. Even though I haven’t touched these so far, it makes perfect sense to think of workers as processes. You can create them and tear them down regardless of their internal state. Node.js instances have resources like message queues and timers, tearing down the entire world in which they live is the easiest way to make sure that none of these resources cause memory leaks.

Limitations

I encountered a number of issues that I don’t have good answers for so far — suggestions are more than welcome! Here is a selection:

  • Node.js provides low level networking APIs, providing things like TCP sockets. There is no such low level API within browsers. All you get are higher level APIs like HTTP or web sockets. Since Node.js’ http API relies on low level APIs, it is not really usable either. I guess in cases like that it is appropriate to inject, say, browserify’s http polyfill. Then again, one could actually implement memory-passing-style TCP for communicating between workers! Requests targeting the outside world could be unwrapped, inspected for being HTTP (like from a firewall) and replayed using the browser’s native HTTP APIs. Essentially simulating a NAT with HTTP-only firewall?
  • The vm API provides methods for executing JavaScript in different contexts. This is also how required scripts are executed or how the REPL works. In a sense, this is what eval does (and that’s what I currently use to fake vm). However there is actually more to vm as becomes apparent when looking at the REPL: Executing const x = 3 will declare variable x and make it available to subsequent statements. This is impossible to do with eval since const automatically activates strict mode, and eval doesn’t introduce variables into the surrounding scope in strict mode. Specifically, eval("var x = 3"); eval("x"); will do what you expect, but eval("const x = 3"); eval("x"); will not. Bummer. Maybe some trickery is possible using importScripts in yet another worker, not sure. This may also solve the next problem, which is that eval is uninterruptible. Ctrl+C is supposed to interrupt while(true); in the REPL. Fortunately, eval seems to be sufficient for require, so this issue is rather cosmetic so far.

Conclusion

My port is very prototypical. While it successfully uses unmodified npm packages like cowsay or chalk (see title image), many bindings are missing. Still, I’m very happy about the results so far since they show that bootstrapping Node.js in the browser works! Coverage-wise, my next goal is to run npm in the browser — it would be awesome to not only be able execute, but also load packages on the fly.

My ultimate goal is to bring those two seemingly diverging worlds of JavaScript back together, without much heavy lifting. We got used to a world in which JavaScript is transformed into other JavaScript in order to be executed. It’s always the same language, but there is a weird variety of flavors that aren’t fully compatible and interchangeable. I hope that things like the ES6 import statement will at one point cause the worlds to converge again.

Show me the code!

The code can be found at https://github.com/olydis/node-in-browser

Warning: this is a hacky, messy prototype and not a beautiful piece of software engineering.

--

--