Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with DefaultDataSetLock #366

Closed
dedeibel opened this issue Feb 18, 2021 · 2 comments · Fixed by #367
Closed

Problem with DefaultDataSetLock #366

dedeibel opened this issue Feb 18, 2021 · 2 comments · Fixed by #367
Assignees
Labels

Comments

@dedeibel
Copy link
Contributor

dedeibel commented Feb 18, 2021

Hi,

my application experienced a dead lock that seems to be related to DefaultDataSetLock. I can see the following exception shortly before the lock-up. It was not accepting any more incoming data for the dataset after that:

StackTrace
    java.lang.IllegalMonitorStateException
    	at java.base/java.util.concurrent.locks.StampedLock.unlockRead(StampedLock.java:708)
    	at de.gsi.dataset.locks.DefaultDataSetLock.readUnLock(DefaultDataSetLock.java:207)
    	at de.gsi.chart.XYChart.lambda$updateAxisRange$10(XYChart.java:285)
    	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    	at de.gsi.chart.XYChart.updateAxisRange(XYChart.java:285)
    	at de.gsi.chart.Chart.layoutChildren(Chart.java:742)
    	at javafx.graphics/javafx.scene.Parent.layout(Parent.java:1206)
[…]
    	at javafx.graphics/com.sun.glass.ui.gtk.GtkApplication.lambda$runLoop$11(GtkApplication.java:277)
    	at java.base/java.lang.Thread.run(Thread.java:834)

I noticed that the writing threads were blocked.

DefaultDataSetLock uses a StampedLock internally, the API says that it does "not unlock" the read lock in case of a stamp validation failure (above exception).

I created a minimal example test to verify if my problem is reproducible. And it was using a "simple" test case that stressed the data set with multiple writers and readers.

Please see:
5c7d8f6#diff-26f7ddc574723fa7614e1a15765109cd1d4842ec1eab439bf72235eb9f860d63
(https://github.com/GSI-CS-CO/chart-fx/compare/demo-data-set-lock-stress-test)

Execution time can be from around a few seconds to a couple of minutes. If it does not fail, adjust the thread count or just repeat the execution until it does.

The exception confuses me a bit though. I can understand why the stamp validation would fail on an optimistic read but this is not applicable here, readLock causes a pessimistic read. Maybe there is a problem regarding the lastReadStamp handling in DefaultDataSetLock since it is only guarded by the thread count atomic var but is not synchronized or even volatile.

My idea: Shortly after a an unlocking thread changed readerCount to zero the next reader could call DefaultDataSetLock's read lock method and cause the lastReadStamp to change. The thread that was trying to unlock the StampedLock before might now use the value returned by the new reader thread.
A hint into that direction could be that I could not observe the problem yet with only one reader thread and the error chance increased with the number of reader threads.

I created a DefaultDataSetLockSynced to validate my guess and I did not see any failures with this version. However I am not suggesting this version as it is surely driving the use of StampedLock into an absurd place. Maybe DefaultDataSetLock should be changed in a way that the stamp is only used as a local var (in case of the methods with lambda parameters) or the stamp is returned to the caller in order to simplify the code and go to a similar interface like StampedLock.

But I am not completely sure about the StampedLock semantics here – it confuses my why a stamp is used at all in the API in case of readLock? Unless StampedLock is always a bit optimistic and makes no guarantees (I could not find a note about that in the docs though). In this case we would have to retry reads every time. But I can find no example source on the web that does that when using readLock only when using tryOptimisticRead. So I am not suggesting that either.

@RalphSteinhagen
Copy link
Member

RalphSteinhagen commented Feb 18, 2021

@dedeibel thanks for reporting. I'll have a look at it.

What makes it more complex, is that this lock is write-reentrant and multiple read reentrant as well, which the other locks do not support. But the issue of the AtomicInteger not being strictly synchronised to the StampedLock could be a (potential) issue.

Need more time to investigate this.

@RalphSteinhagen
Copy link
Member

Some preliminary observations:

it is only guarded by the thread count atomic var but is not synchronized or even volatile.

N.B. AtomicInteger is intrinsically volatile but is IMO not the culprit.

Maybe DefaultDataSetLock should be changed in a way that the stamp is only used as a local var [..]

Tried this before and again now but this does not seem to solve the issue.

I think your proposal to guard the pessimistic read(Un)Lock(..) using synchronize seems to be the better solution.
@wirew0rm recently introduced a similar fix leading to the new unequalToLockHoldingThread(...) function.

What remains to be checked is:

  • re-entrability of the write- and read-lock, and
  • overall performance-penalty (strongly linked to the above)
    N.B. if this localised synchronized works, then we could probably also drop the AtomicInteger for a regular int.

As mentioned before the non-trivial nature is the re-entrability for write (albeit always the same thread) and multiple reader (in different theads). We all wished several times before that there would be a default JDK implementation for that. But even StampedLock (N.B. not reentrable by itself) -- as you also noted before -- isn't that much used around the web.
Most seem to prefer either the nuclear synchronised (or derivative) option and/or to make local immutable copies ... both being bad for performance. 🤔

Again, @dedeibel thanks for reporting and the consise unit-test. This helped a lot to reproduce and isolate that problem. 👍

@RalphSteinhagen RalphSteinhagen linked a pull request Feb 18, 2021 that will close this issue
RalphSteinhagen added a commit that referenced this issue Feb 23, 2021
* added DataSetLockStressTest
  original unit-test idea courtesy Benjamin Peters (@dedeibel)
* fixes DataSetLock readLock race issue #366
* modified read-lock and re-introduced atomic guards
  N.B. only the first reader acquires a lock, subsequent readers increase atomic counter, and only the last remaining reader unlocks. detects race-condition on initial lock and final unlock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants