Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument OS-Lib to block filesystem writes outside of designated Task.dest directories (2000USD Bounty) #3746

Open
1 of 4 tasks
lihaoyi opened this issue Oct 15, 2024 · 11 comments
Labels
Milestone

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Oct 15, 2024


From the maintainer Li Haoyi: I'm putting a 2000USD bounty on this issue, payable by bank transfer on a merged PR implementing this.


The goal of this bounty is to ensure that users on the "happy path" of accessing the filesystem via OS-Lib do not accidentally do the wrong thing by writing to places on disk they shouldn't. This isn't meant to be a 100% secure sandbox to protect against malicious code, but rather just guardrails for users to bump into when they accidentally do the wrong thing and help nudge them back in the right direction.

In particular, we want to block:

  • Code inside task writing to paths on disk outside of that task's designated Task.dest folder

  • Code outside of tasks writing to disk at all, i.e. during module initialization, which is meant to be pure

  • Code inside of tasks reading from disk outside of the .dest folders of upstream paths or Task.Source/Task.Sources/Task.Inputs

Some subtleties:

  • We only want to block unexpected writes to disk inside of WorkspaceRoot.

    • Outside of the WorkspaceRoot still needs to be writable, due to things like caches in /User or /home and temporary files/folders in /tmp.
    • This is a similar approach as tools like Bazel
  • We only want to block OS-Lib APIs for now.

    • People can always use java.io/java.nio/JNI/subprocesses/etc. to work around it, but at least in the common path there'll be something to nudge them when they do something wrong.
    • A fancier approach could use the JVM SecurityManager/bytecode-instrumenting-agent to block java-level file access as well, but for the purposes of this "guardrails" approach that is probably not necessary

Milestones

  1. Instrument the various APIs in https://github.com/com-lihaoyi/os-lib to expose a scala.util.DynamicVariable allowing thread-local gating of reads and writes to specific paths, allowing us to throw errors if a read or write is disallowed (500USD)

  2. Use the patched OS-Lib to limit writes from code running inside a task to locations within WorkspaceRoot but outside the task's Task.dest folder, with a flag to opt-out if desired (500USD)

  3. Limit writes from code running during module-initialization/task-resolution to any location within WorkspaceRoot but outside the task's Task.dest folder, with a flag to opt-out if desired (500USD)

  4. Limit reads from code running inside of tasks accessing locations outside of PathRefs received by that task's direct upstream dependencies. (500USD)

As adding such sandbox restrictions are breaking changes, they should be placed behind a flag in the upcoming Mill 0.12.x series and only turned on by default in 0.13.0

@lihaoyi lihaoyi added the bounty label Oct 15, 2024
@lihaoyi lihaoyi modified the milestone: 0.13.0 Oct 15, 2024
@Ichoran
Copy link

Ichoran commented Oct 16, 2024

Are you sure you want to make os-lib that deeply dependent on a fairly specialized feature? Might it be better to have an API-compatible alternative build so you could depend on os-lib-fenced instead of os-lib if you needed fenced filesystem operations?

Also, everything you mention is JVM-specific but os-lib has a native build as well. Is fencing supposed to work with native?

@lihaoyi
Copy link
Member Author

lihaoyi commented Oct 16, 2024

@Ichoran it seems the most reasonable option. The alternative is we do bytecode instrumentation.

The implementation cost of stuffing this into OS-Lib seems relatively low: just a single DynamicVariable containing a trait Gating{ def allowRead(p: os.Path): Unit; def allowWrite(p: os.Path): Unit } that we can expand upon over time if necessary.

Nothing here seems JVM-specific or Mill-specific. Seems plausible that there would be other use cases for best-effort limiting of filesystem access as well. The JVM baked this stuff in 30 years ago, and although it is now taking it out, I think it's actually a pretty reasonable feature to have as long as expectation are set appropriately (e.g. best-effort guardrails, not a security-sensitive sandbox as the JVM was hoping for)

@Ichoran
Copy link

Ichoran commented Oct 16, 2024

@lihaoyi - Well, the interface change is small, but it's going to touch a lot of code, and make every user of threads add an extra ThreadLocal variable to everything. That's a big deal for Project Loom-style usage. "Don't use os-lib and virtual threads together" isn't a great message.

@lihaoyi
Copy link
Member Author

lihaoyi commented Oct 16, 2024

Why is it a big deal for project loom usage?

@Ichoran
Copy link

Ichoran commented Oct 16, 2024

The ThreadLocal variable has to be duplicated for every use of every thread since otherwise ThreadLocal is broken. So it adds to the per-thread overhead. The Oracle docs all caution against it. I haven't actually run a microbenchmark myself, though.

@lihaoyi
Copy link
Member Author

lihaoyi commented Oct 16, 2024

@Ichoran does that mean println is banned in project loom?

@Ichoran
Copy link

Ichoran commented Oct 16, 2024

println uses ThreadLocal? Huh. Did not know that. Regardless, maybe I should actually do the microbenchmarking and then raise the concern if it's a non-negligible impact.

@lihaoyi
Copy link
Member Author

lihaoyi commented Oct 16, 2024

Yeah, I'm going to say performance is not a problem, until benchmarks demonstrate it is

@Ichoran
Copy link

Ichoran commented Oct 16, 2024

Well, it's measurable, but I guess it's not really important. It seems to be negligible on creation, just usage, and it's only if there's a creation issue that it's really something to avoid. It's a little hard to do benchmarking of threads on a laptop CPU; even if I turn off as many CPU features as I can, it's still a bit inconsistent. Using the benchmark I wrote here, I get the following:

Benchmark                 Mode  Cnt    Score    Error  Units
ThreadLocalLoom.base      avgt   40  641.756 ±  6.503  us/op
ThreadLocalLoom.local10   avgt   40  639.881 ±  9.644  us/op
ThreadLocalLoom.localmod  avgt   40  672.530 ± 17.414  us/op
ThreadLocalLoom.localuse  avgt   40  657.495 ±  9.196  us/op

and if you look through the individual counts, you see the ~640 to ~655 shift as a very common outcome. So, maybe a 3% slowdown above the thread handling overhead. (Most of the time is thread overhead--probably 90% or so. I didn't measure carefully.) Mod vs. Use depends on whether the thread changes the value or just consumes it from a withValue block.

The key point, though is the local10 one, where ten extra DynamicVariable instances are created but they're not used. This has no performance penalty, and that's the one I was worried about.

So, I'm incorrect! A lot of the advice, including from Oracle, also seems to be incorrect or at least too cautious. One can use ThreadLocal with about the usual degree of slowdown for ThreadLocal, probably on the order of 30 ns, because it's about 15 us in a test which uses it 500-odd times.

Sorry about the unwarranted concern! Original plan sounds fine!

@ajaychandran
Copy link
Contributor

@lihaoyi

Instrument the various APIs in https://github.com/com-lihaoyi/os-lib to expose a scala.util.DynamicVariable allowing thread-local gating of reads and writes to specific paths, allowing us to throw errors if a read or write is disallowed (500USD)

  • Does this extend to get/set ops for permissions/stats?
  • Should file/folder create/remove ops require write access on the parent folder?
    • When creating intermediate parent folders, write access should be required on the closest existing parent?
  • When moving files/folders, both read/write access should be required on the source parent folder?

@lihaoyi
Copy link
Member Author

lihaoyi commented Oct 19, 2024

Does this extend to get/set ops for permissions/stats?

Only for set operations

Should file/folder create/remove ops require write access on the parent folder?
When creating intermediate parent folders, write access should be required on the closest existing parent?
When moving files/folders, both read/write access should be required on the source parent folder?

I think we can start with an API as follows

trait Checker{
  def onWrite(path: os.Path): Unit
  def onRead(path: os.Path): Unit
}

where onWrite and onRead are given the fully qualified path where any operation is taking place (not the parent folder). That way it is up to the implementor of Checker (or whatever we call it) how they want to model permissions, deal with nested permissions and so on. OS-Lib does not need to have an opinion on that, it just needs to provide the minimal hooks and the downstream implementor (e.g. Mill) can come up with whatever model is suitable

lihaoyi pushed a commit to com-lihaoyi/os-lib that referenced this issue Oct 27, 2024
…325)

Instrumented path based operations using hooks defined in `Checker`.
```scala
trait Checker {
  def onRead(path: ReadablePath): Unit
  def onWrite(path: Path): Unit
}
```

### Exceptions
The following operations were not instrumented:
- `followLink`, `readLink`
- `list`, `walk`
- `exists`, `isLink`, `isFile`, `isDir`
- read operations for permissions/stats
- `watch`

### Future work
- A more comprehensive design would add hooks for each core operation.
This would eliminate the special check handling in operations like
`move` and `symlink`.
- As such, the methods of `ReadablePath` represent escape hatches. These
cannot be "plugged" without breaking binary compatibility.

This resolves part 1 of [mill
#3746](com-lihaoyi/mill#3746).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants