-
-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument OS-Lib to block filesystem writes outside of designated Task.dest
directories (2000USD Bounty)
#3746
Comments
Are you sure you want to make os-lib that deeply dependent on a fairly specialized feature? Might it be better to have an API-compatible alternative build so you could depend on os-lib-fenced instead of os-lib if you needed fenced filesystem operations? Also, everything you mention is JVM-specific but os-lib has a native build as well. Is fencing supposed to work with native? |
@Ichoran it seems the most reasonable option. The alternative is we do bytecode instrumentation. The implementation cost of stuffing this into OS-Lib seems relatively low: just a single Nothing here seems JVM-specific or Mill-specific. Seems plausible that there would be other use cases for best-effort limiting of filesystem access as well. The JVM baked this stuff in 30 years ago, and although it is now taking it out, I think it's actually a pretty reasonable feature to have as long as expectation are set appropriately (e.g. best-effort guardrails, not a security-sensitive sandbox as the JVM was hoping for) |
@lihaoyi - Well, the interface change is small, but it's going to touch a lot of code, and make every user of threads add an extra ThreadLocal variable to everything. That's a big deal for Project Loom-style usage. "Don't use os-lib and virtual threads together" isn't a great message. |
Why is it a big deal for project loom usage? |
The ThreadLocal variable has to be duplicated for every use of every thread since otherwise ThreadLocal is broken. So it adds to the per-thread overhead. The Oracle docs all caution against it. I haven't actually run a microbenchmark myself, though. |
@Ichoran does that mean |
|
Yeah, I'm going to say performance is not a problem, until benchmarks demonstrate it is |
Well, it's measurable, but I guess it's not really important. It seems to be negligible on creation, just usage, and it's only if there's a creation issue that it's really something to avoid. It's a little hard to do benchmarking of threads on a laptop CPU; even if I turn off as many CPU features as I can, it's still a bit inconsistent. Using the benchmark I wrote here, I get the following:
and if you look through the individual counts, you see the ~640 to ~655 shift as a very common outcome. So, maybe a 3% slowdown above the thread handling overhead. (Most of the time is thread overhead--probably 90% or so. I didn't measure carefully.) Mod vs. Use depends on whether the thread changes the value or just consumes it from a withValue block. The key point, though is the So, I'm incorrect! A lot of the advice, including from Oracle, also seems to be incorrect or at least too cautious. One can use Sorry about the unwarranted concern! Original plan sounds fine! |
|
Only for
I think we can start with an API as follows trait Checker{
def onWrite(path: os.Path): Unit
def onRead(path: os.Path): Unit
} where |
…325) Instrumented path based operations using hooks defined in `Checker`. ```scala trait Checker { def onRead(path: ReadablePath): Unit def onWrite(path: Path): Unit } ``` ### Exceptions The following operations were not instrumented: - `followLink`, `readLink` - `list`, `walk` - `exists`, `isLink`, `isFile`, `isDir` - read operations for permissions/stats - `watch` ### Future work - A more comprehensive design would add hooks for each core operation. This would eliminate the special check handling in operations like `move` and `symlink`. - As such, the methods of `ReadablePath` represent escape hatches. These cannot be "plugged" without breaking binary compatibility. This resolves part 1 of [mill #3746](com-lihaoyi/mill#3746).
From the maintainer Li Haoyi: I'm putting a 2000USD bounty on this issue, payable by bank transfer on a merged PR implementing this.
The goal of this bounty is to ensure that users on the "happy path" of accessing the filesystem via OS-Lib do not accidentally do the wrong thing by writing to places on disk they shouldn't. This isn't meant to be a 100% secure sandbox to protect against malicious code, but rather just guardrails for users to bump into when they accidentally do the wrong thing and help nudge them back in the right direction.
In particular, we want to block:
Code inside task writing to paths on disk outside of that task's designated
Task.dest
folderCode outside of tasks writing to disk at all, i.e. during module initialization, which is meant to be pure
Code inside of tasks reading from disk outside of the
.dest
folders of upstream paths orTask.Source
/Task.Sources
/Task.Input
sSome subtleties:
We only want to block unexpected writes to disk inside of
WorkspaceRoot
.WorkspaceRoot
still needs to be writable, due to things like caches in/User
or/home
and temporary files/folders in/tmp
.We only want to block OS-Lib APIs for now.
java.io
/java.nio
/JNI/subprocesses/etc. to work around it, but at least in the common path there'll be something to nudge them when they do something wrong.Milestones
Instrument the various APIs in https://github.com/com-lihaoyi/os-lib to expose a
scala.util.DynamicVariable
allowing thread-local gating of reads and writes to specific paths, allowing us to throw errors if a read or write is disallowed (500USD)Use the patched OS-Lib to limit writes from code running inside a task to locations within
WorkspaceRoot
but outside the task'sTask.dest
folder, with a flag to opt-out if desired (500USD)Limit writes from code running during module-initialization/task-resolution to any location within
WorkspaceRoot
but outside the task'sTask.dest
folder, with a flag to opt-out if desired (500USD)Limit reads from code running inside of tasks accessing locations outside of
PathRef
s received by that task's direct upstream dependencies. (500USD)As adding such sandbox restrictions are breaking changes, they should be placed behind a flag in the upcoming Mill 0.12.x series and only turned on by default in 0.13.0
The text was updated successfully, but these errors were encountered: