Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

django and transitions [previously about model extensions] #146

Closed
aleneum opened this issue Sep 23, 2016 · 32 comments
Closed

django and transitions [previously about model extensions] #146

aleneum opened this issue Sep 23, 2016 · 32 comments

Comments

@aleneum
Copy link
Member

aleneum commented Sep 23, 2016

We documented the results of this discussion in the FAQ. Make sure to check it out!

Occasionally,

we discuss generic model functionality which might be handy for some users. Just to keep track of these requests, here is the current candidate list:

edit: I split the list into separate comments. This way it might be easier to "upvote" the extension you are looking forward to.

Please comment/"upvote" if you like to see any of these features integrated into transitions

Already implemented:
0.6.0:

@aleneum
Copy link
Member Author

aleneum commented Mar 28, 2017

default/fallback transitions (see #134)

@aleneum
Copy link
Member Author

aleneum commented Mar 28, 2017

django database integration (see #111/#142)

@aleneum
Copy link
Member Author

aleneum commented Mar 28, 2017

sqlalchemy support (see #141)

@aleneum
Copy link
Member Author

aleneum commented Mar 28, 2017

Coroutine/asyncio Support (see #181)

@proofit404
Copy link

I don't think django integration is necessary as separate extension. There is a convenient way for attaching state machines to django models without any boilerplate. Maybe explicit example in the readme will be sophisticated.

from django.db import models
from django.db.models.signals import post_init
from django.dispatch import receiver
from django.utils.translation import ugettext_lazy as _
from transitions import Machine


class ModelWithState(models.Model):
    ASLEEP = 'asleep'
    HANGING_OUT = 'hanging out'
    HUNGRY = 'hungry'
    SWEATY = 'sweaty'
    SAVING_THE_WORLD = 'saving the world'
    STATE_TYPES = [
        (ASLEEP, _('asleep')),
        (HANGING_OUT, _('hanging out')),
        (HUNGRY, _('hungry')),
        (SWEATY, _('sweaty')),
        (SAVING_THE_WORLD, _('saving the world')),
    ]
    state = models.CharField(
        _('state'),
        max_length=100,
        choices=STATE_TYPES,
        default=ASLEEP,
        help_text=_('actual state'),
    )


@receiver(post_init, sender=ModelWithState)
def init_state_machine(instance, **kwargs):

    states = [state for state, _ in instance.STATE_TYPES]
    machine = instance.machine = Machine(model=instance, states=states, initial=instance.state)
    machine.add_transition('work_out', instance.HANGING_OUT, instance.HUNGRY)
    machine.add_transition('eat', instance.HUNGRY, instance.HANGING_OUT)

@aleneum
Copy link
Member Author

aleneum commented May 10, 2017

@proofit404: Thanks for the remark. We will definitely add this to the documentation.

@jxskiss
Copy link

jxskiss commented Jul 24, 2017

Has anyone used this in project, I am fixing a memory leak problem after attaching a state Machine to the django object.

The problem:
Before using state Machine, a django process takes about 100M memory, and with state Machine, it takes about 800M memory (may be more or not, the multiple django processes have exthausted the server's memory and being killed).

Code:

class ItemMachineMixin(object):
    def __init__(self, *args, **kwargs):
        super(ItemMachineMixin, self).__init__(*args, **kwargs)
        self.machine = transitions.Machine(
            model=self,
            states=SM_STATES,
            initial=self.status,
            transitions=SM_TRANSITIONS,
            auto_transitions=False,
            send_event=True,
        )

I'm not sure if the problem is related to the state Machine, or am I missing something? Any advise is welcome, thank you.

@proofit404
Copy link

Does this memory usage start immediately at 800m or does it takes some time?

@jxskiss
Copy link

jxskiss commented Jul 24, 2017

@proofit404 I'm debugging this issue, by disabling the state machine, in a clean started django (with nginx/gunicorn), processing a django admin request increase the process memory from ~100M to ~250M and stop there, but with state machine enabled, also clean started django with same request, the memory grow up from ~100M to about 4GB (8GB * 48%).

@jxskiss
Copy link

jxskiss commented Jul 24, 2017

Looks strange, but I have no idea how to inspect where the memory goes, is there any instruction to profile django or python's memory problems?

@aleneum
Copy link
Member Author

aleneum commented Aug 22, 2017

I added the example from proofit404 as the first answer to the faq notebook.

@jxskiss
Copy link

jxskiss commented Aug 23, 2017

@aleneum The "memory leak" issue I mentioned above, as I dive into it, I found it may be not a leak, but an expected behavior.

In my situation, there are 7 states, 10 transitions and 4 conditions, according to the way pytransitions working, when a django model object is initialized, there will be more than (7 + 10 + 4 = 21) Machine/State/Transition/Condition objects being initialized, django has cache mechanism for model object for performance, so the huge amount objects will stay in memory, and then the memory blows.

I suggest don't use this SM for models which have many records, especially for admin usage. Maybe we should mention this someway in the FAQ.

Also, as a workaround, a used a custom version of "fysom" library to handle the model states, I changed the state machine to be a singleton class, by passing model object as parameter to transitions and conditions.
The memory stays about like without state machine and it works well, my project has been published to production about two weeks ago and it works like a charm.

Finally, I'm very appreciate this awesome project and like to use it in some other situations, thanks.

@proofit404
Copy link

@jxskiss Thanks for your work! If you can provide your solution as example filled with comments, it would be really helpful! I think this topic worth its own page in the documentation.

@aleneum
Copy link
Member Author

aleneum commented Aug 23, 2017

@jxskiss: Thanks for the update. I am not completely familiar with how django handles its data.
Is a django model object equivalent to a data entry? Your remark about records sounds like it.
I can imagine that If you attach a machine and all its content to each entry in a table/model it blows up the memory footprint.

But is it really necessary? To attach a machine to every entry/record I mean. The cool thing about transitions is the separation of model and machine which means you just need to attach the model to your entry and can use one machine instance to handle all models. A global machine instance or singleton might be all you need. The best part is that all model objects will be updated in case the machine changes.

Edit:
The moment you add the model instance to the machine it will be extended with several partial functions which might be in sum also add to the increased memory footprint. This can be reduced by disabling auto_transitions (which means less functions attached to the machine model) in case they are not needed. But State and Transition objects as well as the Machine object do not need to be copied to work with multiple models. The machine instance itself is actually stateless (except if it acts as a model itself).

If you could distill a minimal example (project) of your current approach with transitions (does not have to reach 800M memory usage of course), I could check if I am on the right track here.

@jxskiss
Copy link

jxskiss commented Aug 26, 2017

@aleneum Yes, a model object is a data entry. A global machine is the solution I used finally, but I didn't find an easy setup with transitions, also each model object has its own state, independent with others.

As proofit404 mentioned above, I initialized a machine instance (with states, transitions and conditions objects) for each model object. The problem is right here.

Maybe I used transitions not the right way, glad to know how to use this as global machine or singleton to handler independent state for model objects. Thanks!

Sample code with memory issue:

class ItemStatus(object):
    NEW = 'new'
    NEED_INFO = 'need_info'
    REVIEWING = 'reviewing'
    REDOING = 'redoing'
    CONFLICT = 'conflict'
    VERIFIED = 'verified'
    DELETED = 'deleted'

    SM_STATES = [
        NEW, NEED_INFO, REVIEWING, REDOING, CONFLICT, VERIFIED, DELETED
    ]

    SM_TRANSITIONS = [
        # trigger, source, destination
        ['sm_prepare_new', NEW, NEED_INFO],
        {
            'trigger': 'sm_commit_review',
            'source': NEED_INFO,
            'dest': REVIEWING,
            'conditions': ['check_review_ready'],
        },
        {
            'trigger': 'sm_done_verified',
            'source': [REVIEWING, REDOING],
            'dest': VERIFIED,
            'conditions': ['check_required_fields', 'check_barcodes_valid', 'check_no_conflict'],
        },
        ['sm_mark_conflict', [REVIEWING, REDOING], CONFLICT],
        ['sm_revert_verified', [VERIFIED, CONFLICT], REDOING],
        ['sm_require_info', [REVIEWING, REDOING], NEED_INFO],
        {
            'trigger': 'sm_mark_deleted',
            'source': [
                NEW, NEED_INFO, REVIEWING, REDOING, CONFLICT, VERIFIED
            ],
            'dest': DELETED
        },
        ['sm_revert_deleted', DELETED, REDOING],
        {
            'trigger': 'sm_update',
            'source': [NEW, NEED_INFO, REVIEWING, REDOING, VERIFIED],
            'dest': '=',
        }
    ]


class ItemMachineMixin(object):
    def __init__(self, *args, **kwargs):
        super(ItemMachineMixin, self).__init__(*args, **kwargs)
        self.machine = transitions.Machine(
            model=self,
            states=ItemStatus.SM_STATES,
            initial=self.status,
            transitions=ItemStatus.SM_TRANSITIONS,
            auto_transitions=False,
            send_event=True,
        )


class Item(ItemMachineMixin, models.Model):
	status = models.CharField(max_length=16, default='new')
	# many other fields

@jxskiss
Copy link

jxskiss commented Aug 26, 2017

@proofit404 Thanks, for someone interested with my solution, I have posted it here:

https://gist.github.com/jxskiss/01816eec9a2b64bae341f4d07f58646e

@aleneum
Copy link
Member Author

aleneum commented Aug 27, 2017

Hi @jxskiss and @proofit404,

I just finished some test runs and wanted to share the results. The full code can be found here and is based on what jxskiss provided. This is the most important stuff:

# As a model, I used a simple one field model. `Item` was extended with a *MixIn*
class Item(<MIXIN>, models.Model):
    state = models.CharField(max_length=16, default='new')

# This is a Singleton meta class from
# https://stackoverflow.com/questions/6760685/creating-a-singleton-in-python
class Singleton(type):
    _instances = {}
    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
        return cls._instances[cls]

class GlobalMachine(transitions.Machine):
    __metaclass__ = Singleton

    # # In case you want to use set instead of list to store models
    # def __init__(self, *args, **kwargs):
    #     super(GlobalMachine, self).__init__(*args, **kwargs)
    #     self.models = SetWrapper()

# Since transitions expects a list, extend set by a method 'append'
# You just need that in case you want to use set instead of list for storing models in Machine
class SetWrapper(set):

    def append(self, item):
        self.add(item)

# instead of adding a machine to each entry, we use a global machine here
# the constructor is only called ONCE and ignored after the global
# instance has been created.
# Note that we do not add a model or set an initial state;
# this will be done for each model entry instead
class ItemSingletonMachineMixin(object):
    def __init__(self, *args, **kwargs):
        super(ItemSingletonMachineMixin, self).__init__(*args, **kwargs)
        machine = GlobalMachine(
            model=None,
            states=ItemStatus.SM_STATES,
            transitions=ItemStatus.SM_TRANSITIONS,
            initial=None,
            auto_transitions=False,
            send_event=True
        )
        # save item to be hashable
        self.save()
        machine.add_model(self, initial=self.state)

Evaluation

I tested 5 kinds of models with different MixIns:

  • Item: add a transitions.Machine instance to each record/item
  • ItemSingleton: add each model to a global transitions.Machine
  • ItemSingletonSet: Extend transitions.Machine to use (a subclass of) set instead of list to store models; this increases look up speed in add_model
  • ItemNoMachine: just a minimal model without a state machine (for referencing purposes)
  • ItemFysom: add a customized fysom machine to each record/item
  • ItemFysomClass: use fysom as a class variable, extend the model with an __getattribute__ wrapper which passes the object to the static machine.

note that I wrote "add a customized fysom machine" since imho fysom.StateMachine is not handled as a singleton in the items but gains his waaaay better memory footprint by being more lightweight in general and facilitating static methods where ever possible. But maybe I overlooked something. Edit: I did. It's a class attribute and not bound to an instance as in my current gist. I will update the comparison soon. I added ItemFysomClass which sticks to jxskiss actual idea.

Process

I wrote some helper functions to create instances of Item in memory until the django process exceeded a previously defined memory limit. I also tracked the time how long it took to create the instances (you will see why). This was done on my laptop while I did other things. So its not 100% scientific. Additionally, I added self.save() to each __init__ method of the mixins. Creation times are slightly increased because of this. But it was necessary to make the resulting model hashable (which is required for ItemSingletonSet as you will see later).

Results

With a memory limit of 250MB I could create:

  • 3120 instances of Item
  • 18680 instances of ItemSingleton
  • 18740 instances of ItemSingletonSet
  • 14300 instances of ItemFysom
  • 278500 instances of ItemFysomClass

plot_items

Discussion

As already mentioned by jxskiss, adding a transitions.Machine to each Item will result in a big memory footprint and is therefore not encouraged. Using a lightweight state machine such as the customized version of fysom with static methods produces much better results. However, using transitions.Machine as a Singleton allows to even fit more instances into 250MB (about 30%).
The least overhead can be achieved by passing the model to a static state machine as illustrated in jxskiss's ItemFysomClass. With the help of a small convenience wrapper (__getattribute__) and a transition name prefix (e.g. 'sm_'), the memory footprint of the added state machine logic is trivial as the actual memory consumption is more or less determined by the record/entry and it's field. Both test cases, ItemFysomClass and ItemNoMachine resulted in almost the same amount of objects.

Note that the tested model was close to the smallest possible model. In case your entry includes more fields and complex datatypes, the impact of a model decoration with a state machine will be reduced. These results may vary depending on when and how Python does the garbage collection and/or rearranges stuff in memory.

Using Singleton seems to considerably reduce the object overhead when using transitions. However, using just one Machine has its drawbacks:

plot_timing

transitions.Machine uses list to keep track of added models. This becomes an issue when lots of models are added consecutively since Machine.add_model always checks if the model is already in Machine.models. This takes longer the more models are already in the list.
At the end of the test, it takes up to 0.6 seconds to add 10 instances Item to the machine.
If this does not satisfy an intended use case I propose 2 solutions:

Solution 1: Skip check

A subclass of Machine could override add_model and basically copy the whole method bust just remove if model not in self.models to skip the test.

Solution 2: Use set instead of list

This approach has been tested with ItemSingletonSet and as you see it improves the execution time of add_model dramatically. Drawbacks of using set instead of list: set is unordered and model instances cannot be retrieved by index. Currently, at some points transitions expects Machine.models to be a list but we can work around this by subclassing set.

Feedback welcome

These tests were focused on memory footprint. I also tested for basic functionality but cannot guarantee everything will work without further adaptions to a specific use case. Feel free to ask questions or provide further feedback to improve the interplay of django and transitions. If you think this comment should be part of the faq, please tell me.

@aleneum aleneum changed the title model extension module django and transitions [previously about model extensions] Aug 27, 2017
@jxskiss
Copy link

jxskiss commented Aug 27, 2017

@aleneum Very impressive comparison! There are two points I want to mention:

First, when using Machine.add_model to add an object to the models list, the deconstruction of django model objects should remove the object from models list in some way.

And, as I posted in the gist, I used the Machine as a class attribute, so there will be only one simple Machine object globally without registry, the machine is totally stateless, any object related thing is passed to machine methods as the obj parameter. Though the calling style looks ugly.

class ItemMachineMixin(object):

    sm = StateMachine(
        state_field='status',
        states=ItemStatus.SM_STATES,
        transitions=ItemStatus.SM_TRANSITIONS,
    )

# Usage:
# obj = Item()
# obj.sm.sm_prepare_new(obj)
# can use `obj.sm_prepare_new()` by overriding the `__getattribute__` method

Thanks for your instruction about how to use transitions as global machine, quite helpful!

@jxskiss
Copy link

jxskiss commented Aug 27, 2017

Also as bonus of attaching StateMachine as a class attribute, one can use different state machine for different Django Models within one application, since it's not a really singleton.

@aleneum
Copy link
Member Author

aleneum commented Aug 27, 2017

the deconstruction of django model objects should remove the object from models list in some way.

You can use Machine.remove_model for that in __delete__ for instance.

Also as bonus of attaching StateMachine as a class attribute, one can use different state machine for different Django Models

Ah! Thats what I overlooked. Thanks for the clarification. Sounds very reasonable. I will update the previous report accordingly.

@aleneum
Copy link
Member Author

aleneum commented Aug 31, 2017

@jxskiss, hello once again.

just for the sake of completeness: The approach you have chosen (machine as class attribute; pass model as an argument rather than partial model bindings) does also work with transitions:

from transitions import Machine
from functools import partial
from mock import MagicMock


class Model(object):

    machine = Machine(model=None, states=['A', 'B', 'C'], initial=None,
                      transitions=[
                          {'trigger': 'go', 'source': 'A', 'dest': 'B', 'before': 'before'},
                          {'trigger': 'check', 'source': 'B', 'dest': 'C', 'conditions': 'is_large'},
                      ], finalize_event='finalize')

    def __init__(self):
        self.state = 'A'
        self.before = MagicMock()
        self.after = MagicMock()
        self.finalize = MagicMock()

    @staticmethod
    def is_large(value=0):
        return value > 9000

    def __getattribute__(self, item):
        try:
            return super(Model, self).__getattribute__(item)
        except AttributeError:
            if item in self.machine.events:
                return partial(self.machine.events[item].trigger, self)
            raise


model = Model()
model.go()
assert model.state == 'B'
assert model.before.called
assert model.finalize.called
model.check()
assert model.state == 'B'
model.check(value=500)
assert model.state == 'B'
model.check(value=9001)
assert model.state == 'C'
assert model.finalize.call_count == 4

At a first glance, the only functionality you lose (compared to adding the model to the machine) is the convenience function is_<<state>>.

@jxskiss
Copy link

jxskiss commented Aug 31, 2017

@aleneum Thanks for your serious work! Nice way to make "self" behave as model by passing to the event as parameter.
Glad to have another choice 👍

@aleneum aleneum mentioned this issue Jan 3, 2018
3 tasks
@aleneum
Copy link
Member Author

aleneum commented Jan 3, 2018

solutions presented here will be moved to the example section.

@aleneum aleneum closed this as completed Jan 3, 2018
@cleder
Copy link

cleder commented Nov 9, 2018

I started writing a django_transitions package which will cut the above boiler plate down.
https://github.com/PrimarySite/django-transitions/

  • Example workflow implementation.
  • Base classes and mixins to keep your transitions consistent and avoid boiler plate.
  • Admin mixin to add workflow actions to the django admin.

@cleder
Copy link

cleder commented Nov 14, 2018

I pushed the package to pypi https://pypi.org/project/django-transitions/

@mvanderlee
Copy link

mvanderlee commented Jun 26, 2021

@aleneum Great work, thank you!

It does break the on_enter and on_exit callbacks on the states themselves.
I was able to get around that be overriding the state class on the machine.

class DynamicState(State):
    '''
        Need to dynamically get the on_enter and on_exit callbacks since the
        model can not be registered to the Machine due to Memory limitations
    '''

    def enter(self, event_data):
        """ Triggered when a state is entered. """
        logger.debug("%sEntering state %s. Processing callbacks...", event_data.machine.name, self.name)
        if hasattr(event_data.model, f'on_enter_{self.name}'):
            event_data.machine.callbacks([getattr(event_data.model, f'on_enter_{self.name}')], event_data)
        logger.info("%sFinished processing state %s enter callbacks.", event_data.machine.name, self.name)

    def exit(self, event_data):
        """ Triggered when a state is exited. """
        logger.debug("%sExiting state %s. Processing callbacks...", event_data.machine.name, self.name)
        if hasattr(event_data.model, f'on_exit_{self.name}'):
            event_data.machine.callbacks([getattr(event_data.model, f'on_exit_{self.name}')], event_data)
        logger.info("%sFinished processing state %s exit callbacks.", event_data.machine.name, self.name)


class DynamicMachine(Machine):
    '''Required to use DynamicState'''
    state_cls = DynamicState

Edit (2021-07-24): This breaks calls to set_state as well. Fix is to ensure that you always provide the otherwise optional model kwarg. i.e.:

self.machine.set_state(new_state, model=self)

@aleneum
Copy link
Member Author

aleneum commented Jun 28, 2021

Hello @mvanderlee,

good remark 👍

It does break the on_enter and on_exit callbacks on the states themselves.

I guess you mean callbacks defined on the model. On_enter/exit callbacks defined on states (such as {"name":B, "on_enter": "callback"} or State(name="B", on_enter="callback")) should work.

But considering models you are absolutely right! You lose is_<state> convenience functions AND automatic detection (and addition) of callbacks defined on the model. Basically everything happening in Machine.add_model and Machine._add_model_to_state. If someone wants to define on_enter_<state> functions on the model, your solutions looks good to me.

If memory footprint is critical one could consider moving callbacks to States or make them static though. States are shared (by the static Machine) so that one State object (and its callbacks) can be used by all models. When you exclusively work with string callbacks, I don't expect much difference considering whether you use State.enter or Model.on_enter_<state> though. I guess the benefit only kicks in when you use references to callbacks defined on states or other 'singleton' structures.

@mvanderlee
Copy link

If memory footprint is critical

Just a note on that. Our footprint dropped from > 1 GB for a 1000 objects to < 5MB. That's no longer an 'if critical' issue.

Our use-case involved syncing databases with custom processing logic, for database performance reasons we needed to work in large batches. We created an SQLAlchemy model with a Machine. Using memory-profiler we found that our instance creation step took 1.25 GB. when I updated the class with the above mentioned changes it dropped to 3.3 MB. When I replaced the class with a native dict without pytransitions as a benchmark, it dropped to 1.2 MB.

@aleneum
Copy link
Member Author

aleneum commented Jun 28, 2021

Thanks a lot for taking the time to write that down. Much appreciated!

aleneum added a commit that referenced this issue Jun 28, 2021
@sauravmahuri2007
Copy link

Just a note for someone who is referring this for Python 3, below will not work and will keep creating new instances of Machine as __call__ will never be called for Singleton class.

class GlobalMachine(transitions.Machine):
    __metaclass__ = Singleton

Below will work in Python 3:

class GlobalMachine(transitions.Machine, metaclass=Singleton):
    pass

@Adamantish
Copy link

Adamantish commented Jul 22, 2021

Just reading through, I'm just trying to understand why the default usage pattern for transitions is duplicate the whole machine setup on instantiation of every model object. I don't understand why that pattern is required for any use case? Does the machine instance contain any further mutable state than the value of state?

I guess, what I mean is why not create a new standard way of using the package with internal plumbing based around your singleton example, @aleneum and deprecate the current approach?

@aleneum
Copy link
Member Author

aleneum commented Aug 9, 2021

Hello @Adamantish

the default usage pattern for transitions is duplicate the whole machine setup on instantiation of every model object

Sure, there is the 'beginners/initial' way where a machine also acts as a model. This is usually sufficient for many simple use cases. As this thread shows, this is not suitable for django or any other approach involving databases and a large number of models.

If you plan to use a large number of models, I'd say the 'default' pattern is to have one machine and multiple models. This 'separation of concern' is already illustrated in the first example of the documentation. States, Events and Transitions are not duplicated in this case. Every model get some convenience functions assigned when it is added to the machine. The number of convenience functions depends on how many events (transition names) have been added to the machine. This causes some overhead that is neglectable for most use cases. However, if you plan to use thousands of models it might become a notable memory impact as well. If so, skipping model bindings could tackle this.

Does the machine instance contain any further mutable state than the value of state?

it does contain the list of models, events, transitions and events (and graphs). This can be updated at runtime and will effect all currently managed models immediately.

what I mean is why not create a new standard way of using the package with internal plumbing based around your singleton example

With transitions, we want to provide a flexible and (hopefully) convenient solution for a wide variety of common FSM tasks. Without knowing all the details of your suggestion, the first thing that comes into my mind is that this would mean a major obstacle for all scenarios where users plan to use multiple machines of the same type (with different states, events and transitions) with a set of models that might even be managed by more than one machine.
Right now, I'd say its way easier to apply a singleton pattern when needed than to try to get rid of if if you don't. Furthermore, there are other ways to handle a machine instance to be used by distributed callbacks and/or models. Forcing one way onto library uses might be counterproductive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants