Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new collection operation queue mechanism #2010

Merged

Conversation

maca88
Copy link
Contributor

@maca88 maca88 commented Feb 16, 2019

The main purpose of this PR is to minimize flushes when several operations are done to an uninitialized collection. Currently, a flush will occur after a second operation as already explained by @fredericDelaporte, which can lead to various unexpected behavior like throwing a PropertyValueException with a message not-null property references a null or transient value, when adding an item to the collection or checking its existence, which occurred to me several times in the past.
The new mechanism in this PR replaces the current one which eliminates the need for a flush for all collections except for two cases when using PersistentGenericList<>:

  1. Using an indexed operation after a non indexed one (e.g. use IList<>.RemoveAt after using IList<>.Add)
  2. Using IList<>.Remove after using an indexed operation (e.g. use IList<>.Remove after using IList<>.Insert)

We could eliminate the flushing also for the above cases if we would add an additional method to ICollectionPersister, which would return the index of the given element that could be also used to queue IList<>.IndexOf operation. This can be done in a separate PR if desired.

In addition an optimization was done for PersistentGenericSet<> when using ICollection<>.Add or ISet<>.Add for a transient entity (only when we are able to detect that it is transient) which will not trigger a query for checking its existence.

This PR contains several breaking changes (applies for collections mapped with inverse="true"):

  • Calling IList.Add for an uninitialized collection mapped as lazy="true" will trigger the collection initialization and return the correct index, currently the collection is not initialized and -1 is returned (applies for Bag and List)
  • Calling IList.Add for an uninitialized collection mapped as lazy="extra" will return the correct index, currently -1 is returned (applies for Bag and List)
  • Calling IList.RemoveAt or IList<>.RemoveAt with a negative number will throw an ArgumentOutOfRangeException in order to mimic List<>.RemoveAt behavior, currently it throws a IndexOutOfRangeException (applies for List)
  • Calling IList.RemoveAt or IList<>.RemoveAt with a number that is equal or higher that the current collection size for an uninitialized collection mapped as lazy="extra" will throw an ArgumentOutOfRangeException, currently it allows to queue the operation without throwing an exception (applies for List)
  • Calling IList.Insert or IList<>.Insert with a negative number will throw an ArgumentOutOfRangeException in order to mimic List<>.Insert behavior, currently it throws an IndexOutOfRangeException (applies for List)
  • Calling IList.Insert or IList<>.Insert with a number that is higher that the current collection size for an uninitialized collection mapped as lazy="extra" will throw an ArgumentOutOfRangeException, currently it allows to queue the operation without throwing an exception (applies for List)
  • Getting or setting a value for IList.this[int index] or IList<>.this[int index] with a negative number will throw an ArgumentOutOfRangeException in order to mimic List<>.this[int index] behavior, currently it throws an IndexOutOfRangeException (applies for List)
  • setting a value for IList.this[int index] or IList<>.this[int index] with a number that is equal or higher that the current collection size for an uninitialized collection mapped as lazy="extra" will throw an ArgumentOutOfRangeException, currently it allows to queue the operation without throwing an exception (applies for List)
  • Calling IDictionary<,>.Add or ICollection<>.Add for an uninitialized collection mapped as lazy="extra" with a key that already exists will throw an ArgumentException, currently it allows to queue the operation without throwing an exception (applies for Map)
  • Calling IDictionary<,>.Remove or ICollection<>.Remove for an uninitialized collection mapped as lazy="extra" with a key that does not exist will return false, currently it returns true (applies for Map)
  • Collection dirtiness will be evaluated by EqualityComparer<TValue>.Default when setting an existing key value with IDictionary<,>.this[] for an initialized collection, currently the old and the new value are compared by using ReferenceEquals (applies for Map)
  • Calling ISet<>.Add for an uninitialized collection mapped as lazy="extra" with a transient element that already exists in the set will return false, currently it returns true (applies for Set)
  • Calling ISet<>.Add for an uninitialized collection mapped as lazy="true" with a transient element that does not override Equals method will not initialize the collection.

// (It is also likely one-to-many lists have a similar issue, but nothing is done yet for them. And
// their case is more complex due to having to care for the indexes and to handle many more delayed
// operation kinds.)
if (CollectionPersister.IsOneToMany && loadedCollection.Any(l => ReferenceEquals(l, toAdd)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from PersistentGenericBag<>.SimpleAddDelayedOperation. Instead of the proposed fixes I will try to solve the issue by removing the queued operation for the entity that is immediately inserted because of the id generator from all uninitialized collections that currently contain it. I will try to make a separated PR on top of this one if the mentioned solution will work.

@maca88 maca88 changed the title Add new collection operation queue mechanism WIP - Add new collection operation queue mechanism Feb 16, 2019
@fredericDelaporte
Copy link
Member

fredericDelaporte commented Feb 16, 2019

From reading this PR description, it seems to me that this PR will cause more operations on extra-lazy collections to cause some database queries, for the sake of better honouring the collection contract.

Returning the correct index when adding to an uninitialized list is surely done by initializing the list count of elements by querying this count in the database. (I guess the next adds, or checks for insert/remove at/set at index range, will not re-trigger a count in db.)
So with this PR, the first add/insert/remove at/set at on an uninitialized extra lazy indexed collection will trigger a count, while I guess it was not doing that previously.

Returning false when adding to a set an element it already has surely requires to query the db for its existence in the set. So each add on an uninitialized set would trigger a check in db for the element existence.
Same for immediately throwing in the map case.

The only documentation for extra lazy seems currently to be:

"Extra-lazy" collection fetching - individual elements of the collection are accessed from the database as needed. NHibernate tries not to fetch the whole collection into memory unless absolutely needed (suitable for very large collections)

(From here.)

So, nothing contradicting what this PR does.

But it seemed to me there was a choice made for extra lazy to somewhat break collection contracts when it allows avoiding a database query.
For the extra count query, presumably occurring only on first operation, well, why not. (Still I guess that in one-to-many cases, where an element cannot appears many times in the list, the returned/checked index will be wrong if some add/insert operation is done on an element already in the list, followed by some more adds/inserts/removes.)
But for a set or map, having a query triggered at each add does quite undermine the usefulness of extra lazy. Maybe we should document those contract violations instead of fixing them.

@maca88
Copy link
Contributor Author

maca88 commented Feb 16, 2019

Returning the correct index when adding to an uninitialized list is surely done by initializing the list count

Correct.

I guess the next adds, or checks for insert/remove at/set at index range, will not re-trigger a count in db

Again correct, the count query is triggered only once when the requested operation needs it and it is stored for further operations.

So with this PR, the first add/insert/remove at/set at on an uninitialized extra lazy indexed collection will trigger a count

That is partially correct, Add will not trigger a count if it is the first operation to be executed as it does not need it. The List queue mechanism is composed by two trackers (NonIndexedListQueueOperationTracker and IndexedListQueueOperationTracker) and the first operation will decide which tracker will be used. So when Add is the first operation, the NonIndexedListQueueOperationTracker will be used which does not use Count and when an indexed operation is triggered then a flush is done and the tracker is then switched to IndexedListQueueOperationTracker which does use Count for the Add operation.

Returning false when adding to a set an element it already has surely requires to query the db for its existence in the set

I've updated the set breaking change as it was wrongly written. The query for checking an element existance is already done by the current code:

bool? exists = IsOperationQueueEnabled ? ReadElementExistence(o) : null;
if (!exists.HasValue)
{
Initialize(true);
if (WrappedSet.Add(o))
{
Dirty();
return true;
}
return false;
}
if (exists.Value)
{
return false;
}
QueueOperation(new SimpleAddDelayedOperation(this, o));
return true;

The problem with the current code is when a new entity is added twice in the set, both times true is returned (line 325) when it should be returned only the first time.

Still I guess that in one-to-many cases, where an element cannot appears many times in the list, the returned/checked index will be wrong if some add/insert operation is done on an element already in the list, followed by some more adds/inserts/removes.

I didn't test this scenario yet, but the same issue applies for the current code as there is no existence check when adding an item to an indexed collection.

But for a set or map, having a query triggered at each add does quite undermine the usefulness of extra lazy.

As already mentioned this check is already in the current code and that is why I optimized ICollection<>.Add for the set in order to skip that check when possible. Also map already checks the key existence when adding/removing keys.

@bahusoid
Copy link
Member

  • Calling IList.Add for an uninitialized collection mapped as lazy="true" will trigger the collection initialization and return the correct index, currently the collection is not initialized and -1 is returned (applies for Bag and List)

This potentially could be a pretty serious drawback. As if I understand correctly there is now no way to "lazy add" via IList interface.

@maca88
Copy link
Contributor Author

maca88 commented Feb 17, 2019

As if I understand correctly there is now no way to "lazy add" via IList interface.

It is possible only when the collection is mapped as lazy="extra".

@maca88
Copy link
Contributor Author

maca88 commented Feb 18, 2019

With the latest commit the set optimization for ICollection<>.Add is now also applied for ISet<>.Add.

@maca88
Copy link
Contributor Author

maca88 commented Feb 20, 2019

Still I guess that in one-to-many cases, where an element cannot appears many times in the list, the returned/checked index will be wrong if some add/insert operation is done on an element already in the list, followed by some more adds/inserts/removes.

I've added some tests that show how this PRs works when an element is added multiple times. The behavior is indeed different that the current one for lists as currently, when checking the Count of the list or when retrieving an element by its index a flush is triggered that will clear the queue and take care of duplicate elements. With this PR the flush is not triggered, which means that the collection will work exactly like List<> does, Count will return the number of all elements including duplicated. In order to obtain the old behavior in this case, the user would need to trigger the flush manually before checking the Count of the list or retrieving an element by its index. If we would like to mimic the old behavior for lists and bags we would need to check the element existance for each addition, similar as for a set, which is not worth it. For me, adding an element multiple times into a list is an user error.

gavin.Companies.Add(addedItems[i]);
}

Assert.That(gavin.Companies.Count, Is.EqualTo(10));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By running the test with the current code, Count would be equal to 5.

@maca88 maca88 changed the title WIP - Add new collection operation queue mechanism Add new collection operation queue mechanism Mar 10, 2019
[TestCase(false, true)]
[TestCase(true, false)]
[TestCase(true, true)]
public async Task ListAddDuplicatedAsync(bool initialize, bool flush, CancellationToken cancellationToken = default(CancellationToken))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maca88 can you please update AsyncGenerator's NUnit plugin not to generate cancellation token for [TestCase]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also updated the NHibernate configuration #2126.

Copy link
Member

@hazzik hazzik Apr 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that this is because the test method lacks [Theory]/[Test] attribute.

@hazzik
Copy link
Member

hazzik commented Apr 14, 2019

@fredericDelaporte @bahusoid any conclusions on this PR?

@bahusoid
Copy link
Member

  • Calling IList.Add for an uninitialized collection mapped as lazy="true" will trigger the collection initialization and return the correct index, currently the collection is not initialized and -1 is returned (applies for Bag and List)
  • Calling IList.Add for an uninitialized collection mapped as lazy="extra" will return the correct index, currently -1 is returned (applies for Bag and List)

As I've already said I don't like both changes. IMHO it limits functionality (kills delayed add feature for lazy IList and triggers in most cases unnecessary query for extra lazy IList ) and adds no practical value as you can retrieve count via IList.Count. It's the case when contract violation is better than unsupported feature.

@maca88
Copy link
Contributor Author

maca88 commented Apr 15, 2019

@bahusoid I will revert those changes and update the tests.

@maca88
Copy link
Contributor Author

maca88 commented Jun 16, 2019

Reverted IList.Add behavior and updated the breaking changes.

Copy link
Member

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some preliminary comments on the test. I will go on later.

src/NHibernate.Test/Extralazy/ExtraLazyFixture.cs Outdated Show resolved Hide resolved
src/NHibernate.Test/Extralazy/ExtraLazyFixture.cs Outdated Show resolved Hide resolved
src/NHibernate.Test/Extralazy/ExtraLazyFixture.cs Outdated Show resolved Hide resolved
src/NHibernate.Test/Extralazy/ExtraLazyFixture.cs Outdated Show resolved Hide resolved
@hazzik hazzik added this to the 5.3 milestone May 16, 2020
@fredericDelaporte fredericDelaporte mentioned this pull request Jun 21, 2020
@fredericDelaporte
Copy link
Member

@bahusoid, you were having some concerns about some changes in this PR, which have been reverted. Do you have some other concerns?

@bahusoid
Copy link
Member

Do you have some other concerns?

Nope. I can't make proper code review but in principle it's all good changes

@fredericDelaporte fredericDelaporte merged commit a6a8267 into nhibernate:master Jul 15, 2020
@fredericDelaporte
Copy link
Member

We have forgotten to label this as "possible breaking change" and so it has not be mentioned in the possible breaking changes section of the release notes.
May be related to #2476.

@maca88
Copy link
Contributor Author

maca88 commented Aug 13, 2020

I've added the missing possible breaking change that is related to #2476.

@fredericDelaporte
Copy link
Member

I am not seeing where you have added that. I am not seeing a change on the release notes. (I usually include missing possible breaking changes in the release pr of the next patch, with a note in that release notes about completing the possible breaking changes of the minor.)

maca88 added a commit to maca88/nhibernate-core that referenced this pull request Aug 13, 2020
@maca88
Copy link
Contributor Author

maca88 commented Aug 13, 2020

I've added the missing one in the description of this PR and now I've made a PR for them. Feel free to reword the added notes, as I am not that good at writing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants