Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing Filesystem between multiple PHP instances #1027

Closed
adamziel opened this issue Feb 11, 2024 · 2 comments
Closed

Sharing Filesystem between multiple PHP instances #1027

adamziel opened this issue Feb 11, 2024 · 2 comments

Comments

@adamziel
Copy link
Collaborator

adamziel commented Feb 11, 2024

Blueprints as a PHP library depends on a sharing the filesystem between two PHP instances.

Many tasks must be delegated to a PHP sub-process which makes no sense if that sub-process has no access to WordPress files seen by the main process.

Imagine the following scenario:

<?php proc_open(['php', 'activate_theme.php', 'pendant']);

It could be handled by PHP.wasm as follows:

php1.setSpawnHandler(
	createSpawnHandler(async function (args, processApi) {
		const php2 = await WebPHP.load();
		const result = await php2.run({
			scriptPath: args[1]
		});
		processApi.exit(result.exitCode);
	})
);

If, however, php2 acts on a separate filesystem, the pendant theme won't be activated from the perspective of php1.

This issue is about Web Browser. In Node.js, NODEFS solves this problem

wp-now uses the filesystem of the device it runs on via the NODEFS Emscripten API.

Reusing the same filesystem

PHP.wasm uses MEMFS by default

A newly created PHP instance handles all filesystem operations using an in-memory filesystem implementation called MEMFS. MEMFS keeps track of the files using JavaScript objects. It also contains hardcoded references to HEAP and FS of the PHP instance it lives in

Reusing MEMFS seems like a non-starter

Sharing MEMFS between two PHP instances seems extremely difficult. The hardcoded heap and FS references makes it difficult to bind the same MEMFS to two PHP instances. Perhaps it could be done with deep refactoring, but I worry we'd quickly run into a problem of reusing heap, but not the static memory or stack, between PHP instances. It doesn't seem to be worth it.

IDBFS is too slow

It takes a few minutes to read the WordPress files from IDBFS into MEMFS. It's just too slow for this amount of I/O.

Conceptually, OPFS could work. Unfortunately, the Emscripten OPFS backend crashes

Ditching MEMFS and relying on OPFS would enable all the PHP instances to act on the same underlying files. Emscripten has new and undocumented support for OPFS. I explored it in this PR and, unfortunately, couldn't get it to work without crashing:

Synchronizing two filesystems

If we can't easily share the same filesystem, synchronizing two distinct filesystems is the next best approach.

Overwriting the entire filesystem

Whenever the child process yields to the event loop, we could overwrite the main process's filesystem with all the files from the child's filesystem.

I can only think of two issues with this approach:

Safety – could it affect the main FS in an unsafe way that would not happen with concurrent writes to a shared Filesystem?

I'm not sure, but intuitively, I want to say it's safe. There is no real concurrency involved – this could work in a single JavaScript worker on a single event loop:

  1. The main process pauses, the child process starts
  2. The child process does some work, optionally pipes stdout data
  3. The main process optionally resumes to process the stdout data before pausing again
  4. The child process eventually finishes
  5. The main process continues

Because everything happens in order, replaying the filesystem operations intuitively seem safe to me. Or at least not less safe than running both processes in parallel on my Mac.

Is there a flaw in this reasoning?

Speed – would overwriting the entire filesystem take ages?

My intuition says yes, but rotatedPHP performs exactly this kind of filesystem overwrite and it's barely noticeable. If, however, the speed turns out to be a problem after all, we could turn to the next approach on the list.

Replaying the filesystem operations

Playground supports synchronizing two Playground instances by journaling and replaying the MEMFS operations – see the (demo](https://playground.wordpress.net/demos/sync.html).

This PR explores syncing and replaying FS operations:

Related points

Emscripten supports PThreads and WASM Workers. They seem tempting at first, but they aren't a silver bullet. The underlying implementation is just built with web workers and SharedArrayBuffer, neither of which solves our problem here.

We could potentially use the same SharedArrayBuffer in both PHP instances to keep track of MEMFS files. However, and I'm making a few guesses that could be wrong, we'd end up also sharing the heap, the stack, and the global variables and locks as well. That would make it the same as just calling run() on the current PHP instance, which we can't do. Please tell me if wrong about anything here!

@adamziel adamziel changed the title Shared Filesystem between multiple PHP instances Sharing Filesystem between multiple PHP instances Feb 11, 2024
adamziel added a commit that referenced this issue Feb 11, 2024
adamziel added a commit that referenced this issue Feb 11, 2024
@adamziel
Copy link
Collaborator Author

This PR explores syncing and replaying FS operations:

@adamziel
Copy link
Collaborator Author

adamziel commented Feb 28, 2024

Actually, Emscripten's native PROXYFS provides this exact feature 🎉 This means we can have two (or more!) Emscripten modules acting on the same filesystem.

// Module 2 can use the path "/fs1" to access and modify Module 1's filesystem
module2.FS.mkdir("/fs1");
module2.FS.mount(module2.PROXYFS, {
    root: "/",
    fs: module1.FS
}, "/fs1");

adamziel added a commit that referenced this issue Feb 28, 2024
Adds support for spawning PHP subprocesses via `<?php proc_open(['php',
'activate_theme.php']);`. The spawned subprocess affects the filesystem
used by the parent process.

## Implementation details

This PR updates the default `spawnHandler` in `worker-thread.ts` that
creates another WebPHP instance and mounts the parent filesystem using
Emscripten's PROXYFS.

[A shared filesystem didn't pan out. Synchronizing is the second best
option.](#1027)

This code snippet illustrates the idea – note the actual implementation
is more nuanced:

```ts
php.setSpawnHandler(
	createSpawnHandler(async function (args, processApi) {
		const childPHP = new WebPHP();
		const { exitCode, stdout, stderr } = await childPHP.run({
			scriptPath: args[1]
		});
		processApi.stdout(stdout);
		processApi.stderr(stderr);
		processApi.exit(exitCode);
	})
);
```

## Future work

* Stream `stdout` and `stderr` from `childPHP` to `processApi` instead
of buffering the output and passing everything at once

## Example of how it works

<img width="500"
src="https://github.com/WordPress/wordpress-playground/assets/205419/470d79b2-2f10-4f1a-806c-5f26463766da"
/>

#### /wordpress/spawn.php

```php
<?php
echo "<plaintext>";
echo "Spawning /wordpress/child.php\n";
$handle = proc_open('php /wordpress/child.php', [
	0 => ['pipe', 'r'],
	1 => ['pipe', 'w'],
	2 => ['pipe', 'w'],
], $pipes);

echo "stdout: " . stream_get_contents($pipes[1]) . "\n";
echo "stderr: " . stream_get_contents($pipes[2]) . "\n";
echo "Finished\n";
echo "Contents of the created file: " . file_get_contents("/wordpress/new.txt") . "\n";
```

#### /wordpress/child.php

```php
<?php
echo "<plaintext>";
echo "Spawned, running";
error_log("Here's a message logged to stderr! " . rand());
file_put_contents("/wordpress/new.txt", "Hello, world!" . rand() . "\n");
```

## Testing instructions

1. Update `worker-thread.ts` to create the two files listed above
2. In Playground, navigate to `/spawn.php`
3. Confirm the output is the same as on the screenshot above
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant