Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions logic upgrade #10014

Closed
wants to merge 3 commits into from

Conversation

symbolist
Copy link
Contributor

https://openedx.atlassian.net/browse/TNL-3428

As part of the Django upgrade, we need to move to the transaction framework that was introduced in Django 1.7.

https://docs.djangoproject.com/en/1.8/topics/db/transactions/

This Stackoverflow answer explains the differences of atomicity and durability between the two APIs: http://stackoverflow.com/a/24108491

The different cases I came across and how they are being handled is as follows:

The most important constraint to keep in mind is that we cannot have commit() or rollback() calls inside an atomic block. This means if a function has either of these calls any view which can call it cannot be wrapped in atomic.

  1. Most of the views did not have any custom transaction management. We do not need to do anything besides enabling the new API for wrapping the views.
  2. Some of the functions which were calling commit() or rollback() can just use the atomic context manager. This does change the behaviour sometimes because previously a commit would happen when commit() was called but if the atomic context manager is nested inside another, the commit would happen at the end of the outer atomic. I think in some cases this a reasonable thing to do. student/view.py:confirm_email_change is one such case.
  3. In 3 cases we are doing a commit() if an IntegrityError happens followed by a database read. In all these cases this was quite deep in the call stack and those particular functions can be called from multiple places. I cannot think of a good way to retain this behaviour and have removed it for now.
  4. In grades.py we do commit calls frequently to lesson the chance of overlap between multiple concurrent requests and hence IntegrityErrors. Similarly in the instructor_task app, we commit after creating InstructorTask objects and before sending off the task to celery. In these cases, I think the right approach would be to not have these views wrapped in atomic and instead have a context manager which behaves similarly to the current commit_on_success() which can wrap the views and any functions in the call graph. The goal is that transaction managements in these views should continue to behave like it does now.
  5. A number of other functions in the code base are decorated with commit_on_success(). But since these are called from multiple views, for now these have been wrapped in atomic(). However, this atomic is nested inside the atomic which wraps the calling view and so the commits would happen once the view returns instead of when these functions returns as it does now. The only alternative is use the solution for 4 in a lot of places and I have opted to avoid this if I think we can.

@symbolist symbolist changed the title New Transactions logic upgrade Oct 2, 2015
@macdiesel
Copy link
Contributor

@symbolist is this ready for review?

@symbolist
Copy link
Contributor Author

@macdiesel Not yet.

@symbolist symbolist force-pushed the usman/transactions-update branch 2 times, most recently from aedb2cc to 0210770 Compare October 7, 2015 12:20
@@ -52,7 +52,6 @@ class Command(TrackedCommand):
help="If True, try to transfer certificate items to the new course.")
)

@transaction.commit_manually
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe just change this to @transaction.atomic and remove the commit_on_success context mgr below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have already replaced commit_on_success with atomic below. The reason for keeping it inside the loop is that change for each student should get committed as it happens. Otherwise an error would revert all of the work. Or do you think we should be doing that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symbolist @doctoryes since the transaction isn't in a try block. Does that mean that if command fails on a user part way through the transaction the database would end up in an inconsistent state as there is nothing to catch the exception? You would end up with all the students to that point transferred but the remainder not transferred.

Also would this end up failing on the rerun as the student has already been transferred to the new course?

We have two routes of action here:

  1. add the try/catch and migrate students with no errors and report the ones that were not moved
    or
  2. fail the entire transaction and keep all students in a consistent state.

I think that #2 is better as it would then allow us to troubleshoot and fix the issue without any inconsistencies in the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macdiesel "if command fails on a user part way through the transaction" atomic() will 1. catch the exception, 2. rollback the transaction and 3. re-raise the exception. And yes, that means some of the students would have been transferred and the others not. But this is how it currently works too.

Actually, in Django 1.4 commit_manually is only used to disable automatic transaction management for views. Commands do not need it. So its not doing anything right now either.
https://docs.djangoproject.com/en/1.4/topics/db/transactions/#django.db.transaction.commit_manually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doctoryes I created a branch, reverted @nedbat's commit and then created this PR against it. So all the migration is visible in this PR. I thought it would give us a picture of all of the transaction related logic in one place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know how this all plays out in the large scheme of how students and courses are handled in edx. I realize we would be changing behavior of this command, but I would rather see us keep the state of our data consistent.

Let me pose this question what will be easier to track down in the future:

  1. a command that fails to move all the students and provides a clear and concise error message
    or
  2. a command that succeeds, maybe puts something to stdout or stderr that someone will read or maybe not, then later when someone realizes that not all the students are moved they call in support or devops. Then someone has to comb through the data looking for said students, then they have to look to the code to try and figure out what happened.

I just want to see us strive for both consistency in both our data and program behavior.

/soapbox

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - let me on that soapbox! 😄

Totally agree that 1) is better code/procedure. Just worried about the "hey - this command changed - rollback the release!" scenario. If we're willing to take on some of that risk (maybe release notes would help?), let's do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a transaction.atomic on the command itself. This is a rarely used command so it shouldn't be an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below.

dog_stats_api.increment('instructor_task.subtask.update_exception')
raise
else:
TASK_LOG.debug("about to commit....")
transaction.commit()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atomic context manager takes care of the commit/rollback.

@symbolist symbolist force-pushed the usman/transactions-update branch 4 times, most recently from 46f58a7 to 65a6e00 Compare October 7, 2015 16:29
@@ -494,7 +494,6 @@ def update_subtask_status(entry_id, current_task_id, new_subtask_status, retry_c
_release_subtask_lock(current_task_id)


@transaction.commit_manually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the @transaction.atomic decorator here and avoid the reformatting below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this change from the deprecated commit_manually to atomic will fix a few bulk email False is not true failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated below.

@doctoryes
Copy link
Contributor

PHEW! I've finished my first pass and left comments. Some points:

  • I agree that the added commit_on_success decorator will mirror the current Django 1.4 behavior.
  • I'd prefer to make all transactions conform to Django1.8-style savepoints and the like. But it's unclear how much work that would be - likely significant.
  • We'll still be able to identify the places where we've done this - and fix them in the FUUUUTUUURE (yes - a future that may never exist) if we choose to do so.
  • More testing, demonstrating expected behavior in different nesting situations, is needed.

transaction.commit()
kwargs = {'ccx': ccx, 'location': block.location, 'field': name}
override = CcxFieldOverride.objects.get(**kwargs)
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to create an object first and do a get on an IntegrityError, lets just do a get first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symbolist , would this increase the likelihood of race conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adampalay Yes, to an extent. But this function is only called from 3 Instructor dashboard CCX tab views so I doubt we will be seeing many parallel requests causing IntegrityErrors. If we do we can refactor the views a bit so that we can put outer_atomic around this whole function or maybe shift to READ COMMITTED for the views which do create IntegrityErrors.

I have put in a ticket for this: https://openedx.atlassian.net/browse/TNL-3614

transaction.commit()
return cls.objects.get(course_id=course_id, checkpoint_location=checkpoint_location)
checkpoint, __ = cls.objects.get_or_create(course_id=course_id, checkpoint_location=checkpoint_location)
return checkpoint
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the view below to use read_committed so we shouldn't need this anymore.

@symbolist
Copy link
Contributor Author

@doctoryes @macdiesel I have addressed all comments. Besides that, I have:

  1. Made tweaks in a number of places.
  2. Reduced the usages of commit_on_success to just the progress page views in courseware.
  3. Fixed and added tests for commit_on_success and outer_atomic.

This is ready for another review.

@symbolist
Copy link
Contributor Author

@adampalay Here is the PR I mentioned today. If you would like to discuss any of the changes, please ping me.

@doctoryes
Copy link
Contributor

Nit: There's no docstring or comments explaining why the outer_atomic() decorator exists and the pattern that developers are expected to follow for transactions in Django 1.8 post-upgrade. Such comments would be useful - perhaps they could be added to tests showing how outer_atomic works and demonstrating the recommended pattern.

@doctoryes
Copy link
Contributor

In other words, it should be clear what's deprecated.

@doctoryes
Copy link
Contributor

Also, do you plan to create a ticket for commit_on_success removal for the progress page methods in lms/djangoapps/courseware/{grades|views}.py? What blocks those from being removed?

@doctoryes
Copy link
Contributor

A few nits and follow-ons - but no blockers for merging: 👍


Arguments:
using (str): the name of the database.
read_committed (bool): Whether to use read committed isolation level.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doctoryes I have added more detailed documentation to outer_atomic().

@symbolist
Copy link
Contributor Author

@doctoryes I have added the tests you suggested as well.

Yes, I have created a ticket for replacing the remaining instances of commit_on_success with atomic (https://openedx.atlassian.net/browse/TNL-3607). I haven't looked at the grade functions in courseware very closely yet. I don't see why it shouldn't be possible though we may need to refactor things a bit. But I would like to wait until we can get a sandbox hooked up to prod data before attempting that so that we can test with real data that everything works fine after the migration.

try:
connection.rollback()
except Error:
# An error during rollback means that something
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we want to know if the rollback didn't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put in some logging statements here. Though ideally we want to get rid of this commit_on_success decorator before we deploy the upgrade. Right now it is only being used in courseware/grades.py and on the progress page view. I have created a separate story for it.

@symbolist
Copy link
Contributor Author

@doctoryes @adampalay I have also eliminated the rest of the instances of commit_on_success in lms/djangoapps/courseware/{grades|views}.py and refactored the code to use outer_atomic. You can see the changes in the last two commits.

A bit of context: The reason for adding commit_on_success to the progress page view and the helper functions for grading was that because calculating the grade for a course can take a long time – 10-15 seconds is normal for large courses – student's often click the progress page link multiple times because it does not look like it is responding and this creates a strong chance of two parallel requests which try to do get_or_create as they try to initialize xblocks and save data for them. Adding commit_on_success in these functions divided up the work for calculating the grade into multiple smaller transactions which had a much lower chance of collision. When migrating to the new API I have kept this in mind and used outer_atomic to divide this work into a number of smaller transaction.

@symbolist
Copy link
Contributor Author

@doctoryes Can you review the changes in the last two commits?

@macdiesel
Copy link
Contributor

👍 😎

@symbolist
Copy link
Contributor Author

Closing and creating a new PR against the upgrade branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants