Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve upon (currently dangerous) .from_schedule code #269

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions django_celery_beat/migrations/0012_safe_schedule_uniqueness.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Generated by Django 2.2.2 on 2019-06-15 17:06

from __future__ import absolute_import, unicode_literals

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('django_celery_beat', '0011_auto_20190508_0153'),
]

operations = [
migrations.AlterField(
model_name='clockedschedule',
name='clocked_time',
field=models.DateTimeField(
help_text='Run the task at clocked time', unique=True,
verbose_name='Clock Time'),
),
migrations.AlterUniqueTogether(
name='crontabschedule',
unique_together={('minute', 'hour', 'day_of_week', 'day_of_month',
'month_of_year', 'timezone')},
),
migrations.AlterUniqueTogether(
name='intervalschedule',
unique_together={('every', 'period')},
),
]
42 changes: 13 additions & 29 deletions django_celery_beat/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from celery import schedules
from celery.five import python_2_unicode_compatible
from django.conf import settings
from django.core.exceptions import MultipleObjectsReturned, ValidationError
from django.core.exceptions import ValidationError
from django.core.validators import MaxValueValidator, MinValueValidator
from django.db import models
from django.db.models import signals
Expand Down Expand Up @@ -87,13 +87,7 @@ def from_schedule(cls, schedule):
spec = {'event': schedule.event,
'latitude': schedule.lat,
'longitude': schedule.lon}
try:
return cls.objects.get(**spec)
except cls.DoesNotExist:
return cls(**spec)
except MultipleObjectsReturned:
cls.objects.filter(**spec).delete()
return cls(**spec)
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can still raise MultipleObjectsReturned, why is not handling that better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a uniqueness constraint covering all fields that go into spec so the database should not be able to contain more than a single match at any time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but even the previous logic seems terrible tbh.

As an example, lets say I have 2 different schedules for 2 different tasks, one running every sunrise and one running every sunset (just using solarschedule as the example but this applies to any of them). Now, lets say that my boss comes over and says all tasks running at sunset should be changed to sunrise for only the next week for whatever business reason. Instead of modifying a single schedule, I now have to go through and modify potentially hundreds of different periodic tasks to change their schedule... and then I have to remember which ones I changed so I can change them back later.

It seems like we should be allowing duplicate schedules. I'm really curious why this limitation ever existed in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm digging into where all from_schedule is used to see if we can sort this out some, feel free to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, lets say that my boss comes over and says all tasks running at sunset should be changed to sunrise for only the next week for whatever business reason. Instead of modifying a single schedule, I now have to go through and modify potentially hundreds of different periodic tasks to change their schedule... and then I have to remember which ones I changed so I can change them back later.

I was thinking of the same scenario earlier. I think this feature also has downsides: Once you have multiple schedules with the same values, you'll have a hard time telling them apart when creating new jobs.

Personally, I see the value in a unique constraint but also the pain with not supporting to change a schedule into something that already exists.

To fully support the scenario that you have in mind, it would take another layer of indirection, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm digging into where all from_schedule is used to see if we can sort this out some, feel free to do the same.

There is a single place outside the test suite:

model_schedule = model_type.from_schedule(schedule)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm reading through the celery scheduler code those inherit from to work out the logic on what scenarios this call stack actually happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, based on what I've looked through, the whole reason for all this is that when you use the django settings to set up the schedules, it deletes/inserts those into the DB. However, instead of having a flag for "inserted from django conf" or whatever, they compare it to what's already in the DB and treat the schedule items as if they are unique constraints. Still kind of playing around with this, but if that's the case then the deletes may be necessary for now, although there are probably several scenarios where things can get borked. Still playing around with this though.


def __str__(self):
return '{0} ({1}, {2})'.format(
Expand Down Expand Up @@ -138,6 +132,7 @@ class Meta:
verbose_name = _('interval')
verbose_name_plural = _('intervals')
ordering = ['period', 'every']
unique_together = ('every', 'period')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want two different jobs to run on the same interval? This would disallow that, that seems like an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a mixup between jobs and schedules here: You assign the same "every 4 days" schedule to two distinct jobs and they both run at the time. That works before and after this pull request.

The current cls.objects.filter(**spec).delete() is effectively a unique constraint, except that it might cascade destroy data and does not prevent creation in the first place.

I think what the code needs is one of:

  • a) A uniqueness constraint for all schedules (as done in the pull request)
  • b) A uniqueness constraint for none of the schedules and no dangerous cls.objects.filter(**spec).delete() either

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a mixup between jobs and schedules here: You assign the same "every 4 days" schedule to two distinct jobs and they both run at the time. That works before and after this pull request.

The current cls.objects.filter(**spec).delete() is effectively a unique constraint, except that it might cascade destroy data and does not prevent creation in the first place.

I think what the code needs is one of:

* a) A uniqueness constraint for _all_ schedules (as done in the pull request)

* b) A uniqueness constraint for _none_ of the schedules and no dangerous  `cls.objects.filter(**spec).delete()` either

What do you think?

just curious if we add UniqueContraint Check here would that be any useful?

Copy link
Contributor Author

@hartwork hartwork Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @auvipy I would need get back into the whole topic since it's been so long ago by now. I can do that work but I would need some "guarantee" that we'll actually get somewhere with this in the next few weeks to not invest in a dead end. (In case you're a friend of voice meetings for stuff like that, we can definitely do a Jitsi or Google Meet on the topic if you like, but no pressure, just one more option.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will release a new version tomorrow and do some more analysis to get back to you to reach a consensus

auvipy marked this conversation as resolved.
Show resolved Hide resolved

@property
def schedule(self):
Expand All @@ -149,13 +144,11 @@ def schedule(self):
@classmethod
def from_schedule(cls, schedule, period=SECONDS):
every = max(schedule.run_every.total_seconds(), 0)
try:
return cls.objects.get(every=every, period=period)
except cls.DoesNotExist:
return cls(every=every, period=period)
except MultipleObjectsReturned:
cls.objects.filter(every=every, period=period).delete()
return cls(every=every, period=period)
spec = {
'every': every,
'period': period,
}
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about MultipleObjectsReturned


def __str__(self):
if self.every == 1:
Expand All @@ -174,6 +167,7 @@ class ClockedSchedule(models.Model):
clocked_time = models.DateTimeField(
verbose_name=_('Clock Time'),
help_text=_('Run the task at clocked time'),
unique=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit it like this? Having two jobs run at the same time is completely valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #269 (comment)

)
enabled = models.BooleanField(
default=True,
Expand Down Expand Up @@ -202,13 +196,7 @@ def schedule(self):
def from_schedule(cls, schedule):
spec = {'clocked_time': schedule.clocked_time,
'enabled': schedule.enabled}
try:
return cls.objects.get(**spec)
except cls.DoesNotExist:
return cls(**spec)
except MultipleObjectsReturned:
cls.objects.filter(**spec).delete()
return cls(**spec)
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about MultipleObjectsReturned



@python_2_unicode_compatible
Expand Down Expand Up @@ -280,6 +268,8 @@ class Meta:
verbose_name_plural = _('crontabs')
ordering = ['month_of_year', 'day_of_month',
'day_of_week', 'hour', 'minute', 'timezone']
unique_together = ('minute', 'hour', 'day_of_week', 'day_of_month',
'month_of_year', 'timezone')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue about limiting by the schedule.


def __str__(self):
return '{0} {1} {2} {3} {4} (m/h/d/dM/MY) {5}'.format(
Expand Down Expand Up @@ -317,13 +307,7 @@ def from_schedule(cls, schedule):
'month_of_year': schedule._orig_month_of_year,
'timezone': schedule.tz
}
try:
return cls.objects.get(**spec)
except cls.DoesNotExist:
return cls(**spec)
except MultipleObjectsReturned:
cls.objects.filter(**spec).delete()
return cls(**spec)
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about MultipleObjectsReturned



class PeriodicTasks(models.Model):
Expand Down