Compile C/C++ programs that access data from CernVM-FS repositories to WebAssembly/asm.js, and run them on any device with a modern web browser.
First, run npm install
within the project directory to install some required Node.js packages.
Then, use the emcc-cvmfs
script to compile C programs, or em++-cvmfs
for C++ programs. These scripts are essentially wrappers around emcc
and em++
, which pass along a few required arguments to Emscripten. Any arguments you pass to the wrappers will also be passed on to Emscripten.
The goal of this project is to enable C/C++ programs compiled to Web Assembly or asm.js with Emscripten, to perform POSIX read-only I/O on CernVM-FS repositories. This means, compiling a program that does
int fd = open("/cvmfs/sft.cern.ch/my/dataset", O_RDONLY);
read(fd, buf, len);
will work seamlessly by downloading the appropriate metadata and data, and also caching it on the browser's local storage for later use.
A potential application of this would be running event generator programs like Pythia and Geant4 on the browser efficiently, that is, without packaging all the required data files necessary for the computation with the HTML & JS files. Instead, as the program accesses certain files, they are fetched automatically on-demand and cached locally.
For example, here are the results for Pythia8 example main03 with and without the cvmfs backend,
Pythia8 main03 | Without CVMFS | With CVMFS |
---|---|---|
Compiled files | 36 MB | 7 MB |
Data downloaded from CernVM-FS | - | 750 KB |
Link | saurvs.github.io/pythia8-main03 | saurvs.github.io/cvmfs-pythia8-main03 |
And similarly for Geant4 example B1,
Geant4 B1 | Without CVMFS | With CVMFS |
---|---|---|
Compiled files | 270 MB | 19 MB |
Data downloaded from CernVM-FS | - | 3.3 MB |
Link | saurvs.github.io/geant4-B1 | saurvs.github.io/cvmfs-geant4-B1 |
This project is split into two parts - a CernVM-FS client written in JavaScript (inside cvmfs
), and an Emscripten filesystem backend (inside fs
) that calls into the client's APIs. Emscripten's generic filesystem backend is also slightly modified to support auto-mounting when a program accesses a repository under /cvmfs
.
Note that compiling programs to run on Node.js isn't currently supported, as the client uses browser APIs exclusively for fetching and caching data.
The only master public key included is for a test repository (emscripten.cvmfs.io
). You can add more keys by calling cvmfs.addMasterKey(pkcs8_key)
before mounting a repository. pkcs8_key
must be a string representing a PKCS8 public key in PEM format.
By default, the Local Storage API is used to cache file data and metadata, with LRU eviction when the cache is full. But since this API limits the cache size to less than 10MB on most browsers, there is an experimental caching method implemented that uses Service Workers and the new Cache API instead. Simply passing -swcache
to emcc-cvmfs
will enable this by placing a sw-cache.js
Service Worker script alongside the other output files. This would allow the filesystem to cache much larger data, however, it has only been tested to work on newer (>= 57) versions of Mozilla Firefox.
The following is implemented status of features of CernVM-FS
- Auto-mounting when repository under
/cvmfs
is accessed - Verifying signature on manifest and whitelist, and fingerprint on certificate
- Reading regular files, compressed and uncompressed
- Hash verification using SHA1, RIPEMD-160, and SHAKE-128
- Reading chunked files
- Reading symbolic links with dynamic variable substitution
- Reading nested catalogs
- Reading bind mountpoints
- Reading catalog statistics and properties
- Reading entry metadata by calling
stat
- Reading extended attributes with
getxattr
- Reading external files
Running generate-cvmfs.sh
will create a cvmfs.js
file in the source directory, which contains the entier cvmfs client code (and code from all dependencies). As mentioned above, it can currently only be run on the browser.
The central object is cvmfs.repo
, and it's constructor is function(base_url, repo_name)
. Calling this function will download the manifest, whitelist, and certificate, and then verify all of them. If any error is detected, an undefined
value is returned. It has the following methods:
-
function getManifest()
Returns an object storing the properties of the manifest. -
function getWhitelist()
Returns an object storing the properties of the whitelist. -
function getCertificate()
Returns the certificate of the repository as aKJUR.asn1.x509
object. -
function getCatalog(hash)
Returns an SQL database object representing the catalog for the givenhash
, which must be acvmfs.util.hash
object. -
function getCatalogStats(catalog)
Returns an object storing the catalog's statistics. -
function getCatalogProperties(catalog)
Returns an object storing the catalog's properties (table). -
function getEntriesForParentPath(catalog, path)
Returns an array of string entires for the given directory path. -
function getContentForRegularFile(catalog, path, flags)
Returns the string content of the given regular file at the given path and flags. -
function getChunksWithinRangeForPath(catalog, path, flags, low, high)
Returns an array of chunks (strings) within the given range of a chunked file at the given path and flags. -
function getSymlinkForPath(catalog, path)
Returns the raw string content of the symbolic link at the given path. No variable substitution is performed. -
function getStatInfoForPath(catalog, path)
Returns the following info about the entry at the given path: uid, gid, size, mtime, mode, flags. -
function getNestedCatalogHash(catalog, path)
Returns acvmfs.util.hash
object for the nested catalog whose entry is at the given path in the given catalog. -
function getBindMountpointHash(catalog, path)
Returns acvmfs.util.hash
object for the bind mountpoint whose entry is at the given path in the given catalog.
This project was done as part of a Google Summer of Code project in 2018. The student was Saurav Sachidanand and the mentors were Jakob Blomer and Radu Popescu. A short blog related to this project was also maintained at medium.com/@saurvs.