Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel compilation is buggy #3244

Closed
lukaszcz opened this issue Dec 9, 2024 · 3 comments · Fixed by #3251
Closed

Parallel compilation is buggy #3244

lukaszcz opened this issue Dec 9, 2024 · 3 comments · Fixed by #3251
Assignees
Milestone

Comments

@lukaszcz
Copy link
Collaborator

lukaszcz commented Dec 9, 2024

I can deterministically reproduce a parallelism error on my Linux laptop:

  1. Check out devnet (commit 15c2cf78796321f6d0b9039e705747754f8e8dea) in anoma-apps.
  2. Run juvix clean --global && juvix dependencies update && juvix typecheck
    The result:
juvix: /home/heliax/Documents/anoma-apps/.juvix-build/0.6.8/home/heliax/Documents/anoma-apps/.juvix-build/0.6.8/deps/f6bbef2610b14aa5b27ede28104218d3b0a1c6ba9a42fafa37cea79ab3890f40/.juvix-build/0.6.8/stdlib/Stdlib/Cairo/Ec.jvo: withBinaryFile: resource busy (file is locked)

I get this error every time, but for a different file each time. The error doesn't appear with -N 1.

@lukaszcz lukaszcz added this to the 0.6.9 milestone Dec 9, 2024
@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Dec 9, 2024

This part in ParallelTemplate.compile looks fishy to me, but I don't fully understand how it's supposed to work:

  runReader varCompilationState
    . runReader nodesIx
    . runReader args
    . runReader compileQ
    . runReader deps
    . crashOnError
    $ do
      replicateConcurrently_ _compileArgsNumWorkers $
        lookForWork @nodeId @node @compileProof

Why are the runReader outside of replicateConcurrently? Won't it cause lookForWork to not be fully evaluated (it returns some data structure which represents delayed computation that needs the "reader" values provided to get evaluated, and these values are provided outside the replicateConcurrently, so possibly outside a mutex the code should be evaluated in?)

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Dec 9, 2024

Well, the simple fix of moving the runReader inside replicateConcurrently doesn't seem to help, but the issue may be more subtle. I'm not sure if what we execute under atomically is really evaluated atomically in the way we expect and not just some half-evaluated data structure is returned which is then non-atomically evaluated further when we do runReader. But I might just not understand the code well enough.

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Dec 9, 2024

Ah, no, but it should work because runConcurrent is somewhere at the top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants