Can't compile Swift for TensorFlow quickly #15

philipturner · 2022-06-01T19:38:36Z

The main reason I made the overhauls present in Swift-Colab 2.0 was so that in the future, I could run S4TF code without facing bottlenecks that make it virtually unusable. However, I am unable to compile S4TF for use in the interactive experience. This is after avoiding the problems described in #14.

The test notebook S4TF with TF 2.4 shows my effort to compile S4TF for use in the Swift interpreter. Even though that failed, I can technically compile it using %system flags like in s4tf-on-colab-example-1.ipynb and add custom code to the test suite. But that isn't ergonomic or reproducible in any way.

Specifically, the debugger shows an error when I run the following code. Back in the swift-jupyter era, the TensorFlow module was embedded in the toolchain. So the error below was likely never encountered.

import TensorFlow
print(Tensor<Float>.self)

<Cell 1>:2:7: error: cannot find 'Tensor' in scope
print(Tensor<Float>.self)
      ^~~~~~

The text was updated successfully, but these errors were encountered:

philipturner · 2022-06-04T02:44:11Z

One simple solution to #14 and #15 is a new magic command: %install-x10. But I have to be 100% sure it is necessary. If I change my mind, it's source-breaking.

philipturner · 2022-06-04T18:40:20Z

It actually does load, but you need to restart the runtime first. I haven't tested it yet because I got sidetracked with a bug on the Python side. Either way, I need to investigate why this won't load into the Swift interpreter without restarting the runtime first. That restriction is not present on PythonKit, SwiftPlot, and other libraries.

mikowals · 2022-06-05T11:45:55Z

Hi @philipturner . I can get past your error above by removing some of the flags you set to install s4tf.

I comment these flags:

//%install-swiftpm-flags -c release -Xswiftc -Onone

And then TensorFlow is available to import. I believe setting those flags actually breaks the import of any package, not just TensorFlow. I tried some other packages without clearing the flags and they also silently failed to import.

Sadly though problems still remain. The specific error is:

import TensorFlow
let x = Tensor(0)

Produces:

Couldn't lookup symbols:
  TensorFlow.Tensor.init(_: τ_0_0, on: TensorFlow.Device) -> TensorFlow.Tensor<τ_0_0>
  TensorFlow.Tensor.init(_: τ_0_0, on: TensorFlow.Device) -> TensorFlow.Tensor<τ_0_0>

It looks similar to swift-apis issue 1016 which I don't believe was ever fixed. The error is a generic linking or runtime availability problem though so it is likely a different cause.

The env var LD_LIBRARY_PATH in Colab looks a bit strange and points to /usr/local/nvidia/lib. I don't think any TensorFlow files end up there so maybe that is the cause.

Thanks for the work you are doing on this. You are making impressive progress!

philipturner · 2022-06-05T12:13:41Z

Thanks for investigating! I should be able to narrow this problem down to a small reproducer. Other packages like PythonKit behave just fine, there's some specific reason S4TF is being uncooperative.

philipturner · 2022-06-05T15:14:18Z

I have encountered your error "Couldn't lookup symbols" multiple times today when using PythonKit. It always happens when I forget to execute the %install command after restarting the runtime. Did you execute the command that does %install .package(...) TensorFlow before receiving that error?

I have also used PythonKit multiple times with the -c release -Xswiftc -Onone flags. What packages didn't work when you used those flags? Also, remember to $clear the SwiftPM flags when appropriate.

mikowals · 2022-06-05T21:55:48Z

Success! I added -rpath flag.

%install-swiftpm-flags -Xlinker "-rpath=/content/Library/tensorflow-2.4.0/usr/lib"

The full colab is here.

Now this:

import TensorFlow

print(Device.default) 
let x = Tensor(0)
print(x)
print(x.device)

let y = Tensor(0, on: .defaultXLA)
print(y.device)

Shows this:

Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER)
0.0
Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER)
Device(kind: .CPU, ordinal: 0, backend: .XLA)

I also did some fiddling around the -c release -Swiftc -Onone and determined it is -c release causing the problem. The output shows the flag working correctly - building for production when included and building for debug when excluded. But the production build leads to the import not actually working.

philipturner · 2022-06-05T22:19:13Z

I actually ran through the entire Model Training Walkthrough tutorial on tensorflow/swift, using -c release -Xswiftc -Onone. That specific set of flags makes it take 2 minutes to compile, while standard debug mode compiles in 3 minutes. I haven't tested compiling it in debug mode. You're saying that if it's in debug mode, you don't have to restart the runtime to load the library?

I will definitely narrow this down and find the culprit, because I believe that is a bug with SwiftPM or the Swift compiler. SwiftPlot depends on C dependencies and doesn't have that issue.

%system cp /content/Library/tensorflow-2.4.0/usr/lib/libx10_optimizers_optimizer.so /usr/lib/libx10_optimizers_optimizer.so
%system cp /content/Library/tensorflow-2.4.0/usr/lib/libx10_optimizers_tensor_visitor_plan.so /usr/lib/libx10_optimizers_tensor_visitor_plan.so
%system cp /content/Library/tensorflow-2.4.0/usr/lib/libx10.so /usr/lib/libx10.so
%system cp /content/Library/tensorflow-2.4.0/usr/lib/libx10_training_loop.so /usr/lib/libx10_training_loop.so

%install-swiftpm-flags $clear
%install-swiftpm-flags -c release -Xswiftc -Onone
%install-swiftpm-flags -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN
%install '.package(url: "https://github.com/philipturner/s4tf", .branch("fan/resurrection"))' TensorFlow

I haven't tried using -Xlinker or -rpath yet; I just copied the binaries to system include paths. If I can use your workaround to fix the issue with linking the binary files, then that solves half of my problem. The other half is copying the headers' paths into Clang modulemap files, so that they don't have to be copied into system header directories. I'm working on narrowing a SwiftPM bug affecting the latter task right now.

One fruit of this effort, although not the bug I'm tracking down: swiftlang/swift-package-manager#5482 (comment)

The bug I'm tracking is (from #14):

Two module.modulemap files that declare the same Clang module can overwrite each other, even if one is part of the documentation of a Swift package and never actually involved in the build process. This happened with the modulemap currently in the Utilities directory of s4tf/s4tf.

Utilities/module.modulemap shouldn't appear in the "build.db", and whether it appears or not is highly fickle.

philipturner · 2022-06-05T22:24:12Z

The reason I initially decided to compile S4TF with the old TF 2.4 binary was to narrow down the source of s4tf/s4tf#14, not to make it accessible on Colab. You are welcome to see if that bug exists on the older X10 binary, or even better - help me fix that bug :)

mikowals · 2022-06-05T23:20:11Z

You're saying that if it's in debug mode, you don't have to restart the runtime to load the library?

Maybe. I did not try restarting when compiled with -c release because TensorFlow was not listed in the list of libraries in the output instructions about restarting. Building in debug mode definitely doesn't require a restart.

The reason I initially decided to compile S4TF with the old TF 2.4 binary was to narrow down the source of s4tf/s4tf#14, not to make it accessible on Colab. You are welcome to see if that bug exists on the older X10 binary, or even better - help me fix that bug :)

Yes, I can see that the methods done in this Colab aren't ideal. It should allow me to run some old S4TF models using X10 on TPU if I want. I haven't tried this yet but it would be handy. However elaborate the methods to get there are...

I actually ran through the entire Model Training Walkthrough tutorial on tensorflow/swift, using -c release -Xswiftc -Onone.

I have no doubt that those flags can work and are useful. In this instance though there appears to be some interaction with the runtime in Colab, the %install command, or the build process. There are many moving pieces here. Not sure what to say other than that it consistently works with those flags commented and fails with them.

philipturner · 2022-06-05T23:29:33Z

Also, I'm planning to un-comment out x10_training_loop from the package manifest on both the head branch and this TF 2.4 branch. It was deactivated in January 2021 because of some build failure with SwiftPM, but I hypothesize that has long since been fixed.

philipturner · 2022-06-16T01:18:08Z

SwiftPlot has started failing to import on the first try if you use -c release -Xswiftc -Onone. You have to restart the runtime and rerun the %include command. This strange import behavior some time appeared between the 2021-12-06 and 2021-12-23 toolchains. This is a different time frame from when S4TF started experiencing the behavior (??? to 2021-11-12). To clarify, SwiftPlot and S4TF started failing to import at different times chronologically.

This is confirmation that the behavior is a bug. Something incorrect started happening in the Swift compiler before 2021-11-12. It was exposed to a greater extent in December, causing SwiftPlot to fail. Hopefully I can fix the compiler bug and integrate a patch into the 5.7 or 5.7.1 release.

philipturner · 2022-06-18T18:31:42Z

Even wierder - you now have to restart 2 times to use S4TF on s4tf/s4tf:main! You only need to restart once when using fan/resurrection. Both branches were tried on the same 2022-05-11 toolchain and with a factory reset Colab instance, but I need to double-check that there are no confounding variables. Doing so is time-intensive because each test takes around 3 minutes, so I don't feel like it right now.

Something is very off here, and I'll instruct the user to avoid -c release -Xswiftc -Onone until this narrowed down. The nature of the bug (interacting with LLDB, not yet reproducible on macOS, reproducers exist in massive code bases) makes it time-intensive to narrow down. The -c release -Xswiftc -Onone flags only reduce compile time by 33%, so the user will just have to deal with it.

philipturner · 2022-06-20T13:11:28Z

v2.2 was released, and the README has instructions for compiling Swift for TensorFlow. I noted the issue with -c release -Xswiftc -Onone, keeping the SwiftPM flags directive commented out. Let me know whether this works for you!

philipturner · 2022-06-21T00:25:17Z

It should allow me to run some old S4TF models using X10 on TPU if I want. I haven't tried this yet but it would be handy. However elaborate the methods to get there are...

@mikowals I just got S4TF to run on a TPU. Look at the "TPU Tests" notebook at the bottom of the README. It was 8 TPUs at once, on the Colab free tier! I had never experienced using a TPU before. Could you provide some old X10 models designed for TPU, so that I can include them in the test suite?

philipturner mentioned this issue Jun 1, 2022

Compiling with CUDA support s4tf/s4tf#15

Open

philipturner changed the title ~~Can't compile Swift for TensorFlow~~ Can't run Swift for TensorFlow Jun 1, 2022

philipturner mentioned this issue Jun 2, 2022

-parseable-output being dumped by driver or SPM with non-traditional package layout. swiftlang/swift-package-manager#5482

Open

philipturner mentioned this issue Jun 4, 2022

« Dockerizing » Swift-Colab #16

Open

philipturner added the bug Something isn't working label Jun 17, 2022

philipturner self-assigned this Jun 18, 2022

philipturner mentioned this issue Jun 18, 2022

LLDB doesn't import C-dependent package unless debug symbols are present swiftlang/swift#59569

Closed

philipturner added the help wanted Extra attention is needed label Jun 18, 2022

philipturner changed the title ~~Can't run Swift for TensorFlow~~ Can't compile Swift for TensorFlow quickly Jun 18, 2022

philipturner removed their assignment Jun 18, 2022

philipturner mentioned this issue Jul 7, 2022

Remove SR-15884 workaround and CMake support s4tf/s4tf#18

Merged

philipturner mentioned this issue Jul 26, 2022

Provides a JupyterDisplay protocol package? #21

Open

philipturner closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't compile Swift for TensorFlow quickly #15

Can't compile Swift for TensorFlow quickly #15

philipturner commented Jun 1, 2022 •

edited

Loading

philipturner commented Jun 4, 2022

philipturner commented Jun 4, 2022 •

edited

Loading

mikowals commented Jun 5, 2022

philipturner commented Jun 5, 2022

philipturner commented Jun 5, 2022 •

edited

Loading

mikowals commented Jun 5, 2022

philipturner commented Jun 5, 2022 •

edited

Loading

philipturner commented Jun 5, 2022

mikowals commented Jun 5, 2022

philipturner commented Jun 5, 2022 •

edited

Loading

philipturner commented Jun 16, 2022 •

edited

Loading

philipturner commented Jun 18, 2022 •

edited

Loading

philipturner commented Jun 20, 2022

philipturner commented Jun 21, 2022

Can't compile Swift for TensorFlow quickly #15

Can't compile Swift for TensorFlow quickly #15

Comments

philipturner commented Jun 1, 2022 • edited Loading

philipturner commented Jun 4, 2022

philipturner commented Jun 4, 2022 • edited Loading

mikowals commented Jun 5, 2022

philipturner commented Jun 5, 2022

philipturner commented Jun 5, 2022 • edited Loading

mikowals commented Jun 5, 2022

philipturner commented Jun 5, 2022 • edited Loading

philipturner commented Jun 5, 2022

mikowals commented Jun 5, 2022

philipturner commented Jun 5, 2022 • edited Loading

philipturner commented Jun 16, 2022 • edited Loading

philipturner commented Jun 18, 2022 • edited Loading

philipturner commented Jun 20, 2022

philipturner commented Jun 21, 2022

philipturner commented Jun 1, 2022 •

edited

Loading

philipturner commented Jun 4, 2022 •

edited

Loading

philipturner commented Jun 5, 2022 •

edited

Loading

philipturner commented Jun 5, 2022 •

edited

Loading

philipturner commented Jun 5, 2022 •

edited

Loading

philipturner commented Jun 16, 2022 •

edited

Loading

philipturner commented Jun 18, 2022 •

edited

Loading