Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleIdGenerator is not thread safe, neither it is documented to be #25485

Closed
wind57 opened this issue Jul 28, 2020 · 16 comments
Closed

SimpleIdGenerator is not thread safe, neither it is documented to be #25485

wind57 opened this issue Jul 28, 2020 · 16 comments
Assignees
Labels
in: messaging Issues in messaging modules (jms, messaging) type: task A general task
Milestone

Comments

@wind57
Copy link
Contributor

wind57 commented Jul 28, 2020

I have a small miss-understanding about SimpleIdGenerator, that is a fairly trivial class:

public class SimpleIdGenerator implements IdGenerator {

	private final AtomicLong mostSigBits = new AtomicLong(0);

	private final AtomicLong leastSigBits = new AtomicLong(0);


	@Override
	public UUID generateId() {
		long leastSigBits = this.leastSigBits.incrementAndGet();
		if (leastSigBits == 0) {
			this.mostSigBits.incrementAndGet();
		}
		return new UUID(this.mostSigBits.get(), leastSigBits);
	}

}

The presence of AtomicLong hints into the fact that this is a thread-safe continuous incremented UUID, but it's not the case:

long leastSigBits = this.leastSigBits.incrementAndGet();
if (leastSigBits == 0) {

For the sake of the discussion let's suppose that currently leastSigBits holds a -1 (it has been incremented quite a lot, yes).

ThreadA does long leastSigBits = this.leastSigBits.incrementAndGet();, so it puts the value into 0 (-1 + 1 = 0); but before it does the check if (leastSigBits == 0), ThreadB did long leastSigBits = this.leastSigBits.incrementAndGet(); too, now on a value that is 0, so it put the value in 1. ThreadA does the check and sees a value of 1, that if statement is not entered and a such a duplicate UUID.

This is very far stretched and I have doubts it has ever impacted any users as for this to happen they would need to generate all the long range of IDs, which is highly highly improbable. Still, this code is wrong.

If this is suppose to provide thread-safe variant :

  • document it as such
  • fix the code

if this isn't supposed to be thread safe, simply dropping the un-necessary AtomicLong (with it's volatile overhead) is going to be a big performance gain.

Either way, I would be more than glad to fix this, if someone tells me the path I should be taking. Thank you.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jul 28, 2020
@abhishek-abhi
Copy link

Can I fix this .

@rstoyanchev
Copy link
Contributor

ThreadA does the check and sees a value of 1

Why would that be? The value of incrementAndGet is saved in a local variable.

@wind57
Copy link
Contributor Author

wind57 commented Jul 28, 2020

@rstoyanchev I should have had my coffee before posting this. you're right and I am wrong.

Either way there is a race here.

ThreadA does this:

long leastSigBits = this.leastSigBits.incrementAndGet();
		if (leastSigBits == 0) {
			this.mostSigBits.incrementAndGet();
		}

it sets long leastSigBits as 0, enters the if statement, but does not do this.mostSigBits.incrementAndGet(); just yet. ThreadB acts at the same time; it sets its local long leastSigBits to 1, does not enter the if statement and does return new UUID(this.mostSigBits.get(), leastSigBits);. this.mostSigBits.get() == 0 and leastSigBits == 1; as such it returns an UUID that was already returned.

@quaff
Copy link
Contributor

quaff commented Jul 30, 2020

ThreadA does the check and sees a value of 1

Why would that be? The value of incrementAndGet is saved in a local variable.

The result of this.mostSigBits.incrementAndGet() should be saved in a local variable too and instead of this.mostSigBits.get() later.

@quaff
Copy link
Contributor

quaff commented Jul 30, 2020

@rstoyanchev I should have had my coffee before posting this. you're right and I am wrong.

Either way there is a race here.

ThreadA does this:

long leastSigBits = this.leastSigBits.incrementAndGet();
		if (leastSigBits == 0) {
			this.mostSigBits.incrementAndGet();
		}

it sets long leastSigBits as 0, enters the if statement, but does not do this.mostSigBits.incrementAndGet(); just yet. ThreadB acts at the same time; it sets its local long leastSigBits to 1, does not enter the if statement and does return new UUID(this.mostSigBits.get(), leastSigBits);. this.mostSigBits.get() == 0 and leastSigBits == 1; as such it returns an UUID that was already returned.

@wind57 I agree.

@b4nnertail
Copy link

@rstoyanchev I should have had my coffee before posting this. you're right and I am wrong.
Either way there is a race here.
ThreadA does this:

long leastSigBits = this.leastSigBits.incrementAndGet();
		if (leastSigBits == 0) {
			this.mostSigBits.incrementAndGet();
		}

it sets long leastSigBits as 0, enters the if statement, but does not do this.mostSigBits.incrementAndGet(); just yet. ThreadB acts at the same time; it sets its local long leastSigBits to 1, does not enter the if statement and does return new UUID(this.mostSigBits.get(), leastSigBits);. this.mostSigBits.get() == 0 and leastSigBits == 1; as such it returns an UUID that was already returned.

@wind57 I agree.

I prepared a unit test to show the issue:

class SimpleIdGeneratorTest {

	private final AtomicLong mostSigBits = new AtomicLong(0);
	private final AtomicLong leastSigBits = new AtomicLong(-1);

	Set<UUID> ids = new HashSet<>();

	@Test
	void generateIdTest() throws InterruptedException {

		ids.add(new UUID(0, 1));
		ExecutorService executorService = Executors.newFixedThreadPool(2);
		executorService.execute(() -> ids.add(generateId()));
		executorService.execute(() -> ids.add(generateId()));

		Thread.sleep(200);
		assertTrue(ids.contains(new UUID(0, 1)));
		assertTrue(ids.contains(new UUID(1, 0)));

                 // This UUID is missing, instead UUID(mostSigBits = 0, leastSigBits = 1) got created twice
		assertFalse(ids.contains(new UUID(1, 1)));
	}

	public UUID generateId() {
		long leastSigBits = this.leastSigBits.incrementAndGet();
		if (leastSigBits == 0) {
			try {
				Thread.sleep(100);
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
			this.mostSigBits.incrementAndGet();
		}
		return new UUID(this.mostSigBits.get(), leastSigBits);
	}
}

I guess, to be truly thread safe, we would need to introduce a synchronized block as shown below. Please correct me if I'm wrong.

class SimpleIdGeneratorTest {

	private final AtomicLong mostSigBits = new AtomicLong(0);
	private final AtomicLong leastSigBits = new AtomicLong(-1);

	Set<UUID> ids = new HashSet<>();

	@Test
	void generateIdTest() throws InterruptedException {

		ids.add(new UUID(0, 1));
		ExecutorService executorService = Executors.newFixedThreadPool(2);
		executorService.execute(() -> ids.add(generateId()));
		executorService.execute(() -> ids.add(generateId()));

		Thread.sleep(200);
		assertTrue(ids.contains(new UUID(0, 1)));
		assertTrue(ids.contains(new UUID(1, 0)));
		assertTrue(ids.contains(new UUID(1, 1)));
	}

	public UUID generateId() {
		long leastSigBits = this.leastSigBits.incrementAndGet();
		synchronized (this) {
			if (leastSigBits == 0) {
				try {
					Thread.sleep(100);
				} catch (InterruptedException e) {
					e.printStackTrace();
				}
				this.mostSigBits.incrementAndGet();
			}
			return new UUID(this.mostSigBits.get(), leastSigBits);
		}
	}
}

@wind57 wind57 closed this as completed Jul 30, 2020
@wind57 wind57 reopened this Jul 30, 2020
@wind57
Copy link
Contributor Author

wind57 commented Jul 30, 2020

@puhlerblet usually a unit test proves very little in case of thread safety, though, I admit, in this case it is fairly trivial to prove the point. Otherwise, you could also look into this tool, that I usually use. It does instruction re-ordering that can prove things, see more here for example.

Also both synchronized and AtomicLong? Under contention, an intrinsic lock is "inflated" and when a GC needs to happen - it needs to deflate that, which is a process that takes time (and easily provable with JMH). I would suggest a ReentrantLock, instead.

But the problem is a bit more subtle, it seems to me. Keep in mind that for this problem to be observed, we need to go through the entire range of positive and negative values of a long. Let's take the easy case:

 Long.MAX_VALUE == 9223372036854775807

So there are 9223372036854775807 positive possibilities in a long. We now look of how big is an UUID:

java.util.UUID object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           05 00 00 00 (00000101 00000000 00000000 00000000) (5)
      4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4        (object header)                           2c 19 17 00 (00101100 00011001 00010111 00000000) (1513772)
     12     4        (alignment/padding gap)
     16     8   long UUID.mostSigBits                          0
     24     8   long UUID.leastSigBits                          0
Instance size: 32 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total

take 32 bytes multiply by 9223372036854775807 and you will need that much space, just for the UUID, just for positive part of the least significant bits, if you want to cover the entire range.

And if I now look at the documentation of the class itself:

A simple IdGenerator that starts at 1 and increments by 1 with each call.

I could say that there is no need for two AtomicLong as, realistically, no one will ever hit so many IDs.

@b4nnertail
Copy link

@wind57 since i am quite inexperienced in dealing with concurrency I really appreciate your input. Your assumption, that no one will ever hit so many IDs sounds reasonable.

@wind57
Copy link
Contributor Author

wind57 commented Aug 15, 2020

@rstoyanchev what do you think about this one?

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Sep 11, 2020

@wind57 indeed every time leastSigBits rolls over, some id's may be based on the current rather than on the next mostSigBits value which in practice means that some id's may be issued more than once, but that would only happen once in a very long while and the duplicates would have been last issued a very long time ago

I think it's simply not worth optimizing and we can just update the docs to say that uniqueness is not fully guaranteed and that on occasion, after every Long.MAX_VALUE number of id's are issued, there may be repeated ones for a short period. If that's an issue for some reason then don't use this generator.

Thoughts? /cc @garyrussell @artembilan

@rstoyanchev rstoyanchev added the status: waiting-for-feedback We need additional information before we can continue label Sep 11, 2020
@rstoyanchev
Copy link
Contributor

This is by the way one alternative I thought of but again I'm not sure it's necessary:

public class SimpleIdGenerator implements IdGenerator {

	private final AtomicLong mostSigBits = new AtomicLong(0);

	private final AtomicLong leastSigBits = new AtomicLong(0);

	private final AtomicLong lastMostSigBits = new AtomicLong(-1);


	@Override
	public UUID generateId() {
		long leastSigBits = this.leastSigBits.incrementAndGet();
		if (leastSigBits == 0) {
			this.mostSigBits.incrementAndGet();
		}
		else if (leastSigBits == Long.MAX_VALUE - 10000) {
			this.lastMostSigBits.incrementAndGet();
		}
		else if (leastSigBits < 10000) {
			while (true) {
				if (!this.mostSigBits.equals(this.lastMostSigBits)) {
					break;
				}
			}
		}
		return new UUID(this.mostSigBits.get(), leastSigBits);
	}

}

@garyrussell
Copy link
Contributor

Bear in mind that this generator is pretty much useless in any environment that persists the ID in a DB and/or a multi-instance application.

Not only does the problem (potentially) occur after Long.MAX_VALUE allocations, but those allocations must have occurred in the same JVM.

This generator was provided long ago for a very specific use case where the IDs were not persisted and UUID.randomUUID() was found to be too expensive for the ID allocation rate.

So, I would say documentation should be enough.

If performance is really an issue, consider using the AlternativeJdkIdGenerator instead.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Sep 11, 2020
@rstoyanchev rstoyanchev self-assigned this Sep 11, 2020
@rstoyanchev rstoyanchev added in: messaging Issues in messaging modules (jms, messaging) type: task A general task and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged or decided on labels Sep 11, 2020
@rstoyanchev rstoyanchev added this to the 5.2.9 milestone Sep 11, 2020
@rstoyanchev
Copy link
Contributor

Thanks @garyrussell those will be good clarifications. For same JVM, not persisted, and reasonably close to unique. If those trade-offs are okay you'll probably get better performance than AlternativeJDkIdGenerator.

@wind57
Copy link
Contributor Author

wind57 commented Sep 11, 2020

  • I agree on the part about documentation.

  • I disagree on the implementation prototype, though. Why not simply use a single AtomicLong and specify in the documentation that it supports unique values up to Long.MAX_VALUE and then it will simply overflow?

@wind57 wind57 closed this as completed Sep 11, 2020
@wind57 wind57 reopened this Sep 11, 2020
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Sep 11, 2020

and specify in the documentation that it supports unique values up to Long.MAX_VALUE and then it will simply overflow

That's an option I guess, make it more official that after Long.MAX_VALUE it won't be fully unique. If that is an issue to begin with then it's the wrong generator to use one way or another. @garyrussell you agree with that?

@garyrussell
Copy link
Contributor

Yes, I agree; users are always able to define their own implementation if they are not happy with this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: messaging Issues in messaging modules (jms, messaging) type: task A general task
Projects
None yet
Development

No branches or pull requests

7 participants