Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 275: Thow exception on infinite loops #425

Closed
wants to merge 4 commits into from

Conversation

WilliamParker
Copy link
Collaborator

…s not addressed yet.

This is still a fairly rough draft at this point. Please don't merge it; I expect the durability tests to fail. I wanted to give some visibility into my thinking here though.

Non-exhaustive lists of items for consideration:

  • The value of the :max-cycles setting is just a number, which means we could serialize it along with the session in durability. Should we? Pro: it is one less thing for a user to think about upon deserialization. Con: The Fressian implementation doesn't serialize other options that can't be reliably serialized. Also this option doesn't impact the semantics of the ruleset, while other options do, so it arguably is different in kind.
  • What should the default value be?

@WilliamParker WilliamParker changed the title Issue 275: First draft of exception on infinite loops. Durability i… Issue 275: Thow exception on infinite loops Mar 18, 2019
@WilliamParker
Copy link
Collaborator Author

After some reflection I think the :max-cycles setting actually makes more sense as an option to fire-rules. We already added options to fire-rules for issue 249. The maximum cycles that are plausible is obviously partly dependent on the rules in question, but the specific input data and environment where the rules are run (i.e. prod, dev, or anything that impacts the cost of a false positive versus a false negative in detection of infinite cycles) are also important. If users want to have a session always have the same max-cycles option for all fire-rules calls it should be readily possible for them to wrap the fire-rules function to always have the option given.

@WilliamParker
Copy link
Collaborator Author

As usual I'll want to take a second look with fresh eyes before merging, but this can be reviewed as a candidate PR for merge now. @mrrodriguez @EthanEChristian

Copy link
Collaborator

@mrrodriguez mrrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only done a fast preliminary review. I am posting since I had a few questions right away.

[]
(l/get-children p-listener)))
(catch #?(:clj Exception :cljs :default)
listener-exception
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation makes this confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you indent it? This looks fairly normal to me but open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(catch #?(:clj Exception :cljs :default)  listener-exception
     listener-exception)

is probably how i would write it, but that just because i am used to seeing:

(try 
  <some logic>
  (catch Exception ex
    <do something>))

though the reader conditional makes this a bit different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was weird to see the first listener-exception on the next line. It just makes it blend with the body.
I don't consider it to be a critical point though if others thing it is fine how it is. I know reader-conditionals can make things get a bit weird.

@@ -1817,13 +1835,13 @@
(when (some-> node :production :props :no-loop)
(flush-updates *current-session*)))))

(recur (mem/next-activation-group transient-memory) next-group)))
(recur (mem/next-activation-group transient-memory) next-group flushed-updates-count)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no inc this time?'
Should it be set back to 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was that this path is reached every time a :no-loop rule is fired, while the other paths where I incremented the count are only reached upon the end of an activation group. I expect activation group switches to be much less numerous than individual rules firing, and I thought the risk of throwing in ultimately terminating cases here to be too high.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that explanation makes sense. However, you may want to note that in a comment above this since I think it could be hard to understand looking at it later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this easier to understand now @mrrodriguez ? 13ef0df

@@ -1867,7 +1885,8 @@
(fire-rules [session opts]

(let [transient-memory (mem/to-transient memory)
transient-listener (l/to-transient listener)]
transient-listener (l/to-transient listener)
max-cycles (get opts :max-cycles 600000)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on the default chosen here. It could be breaking.

I haven't thought it completely through, but is the counting no specifically counting cycles of rules happening, but instead just steps happening during fire-rules? Perhaps that is not granular enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count is intended to be incremented each time an activation group is fully fired. I'm expecting that to be much smaller than the number of times a rule is fired, and so safer in terms of avoiding false positives and incorrectly throwing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to post more details later on how I chose the count and maybe change it - I measured the count for 30 seconds of a simple case on my device machine. Open to discussion on what the count should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Mike here:

I'm not sure on the default chosen here. It could be breaking.

Given the fabricated set of rules

(ns clara.loop-detection
  (:require [clojure.test :refer :all]
            [clara.rules :as r]
            [clara.rules.accumulators :as acc]))

(defrecord Total [x])

(defrecord Subs [y])
(defrecord Sum [z])

(r/defrule sum-rule
  [Total (= ?total x)]
  [?s <- (acc/sum :y) :from [Subs]]
  [:test (> ?total ?s)]
  =>
  (r/insert! (->Sum (- ?total ?s))))

(r/defrule subs-rule
  [Sum]
  =>
  (r/insert-unconditional! (->Subs 1)))

(r/defquery sum-query
  []
  [?s <- (acc/count) :from [Subs]])

(-> (r/mk-session)
    (r/insert (->Total 600000))
    (r/fire-rules)
    (r/query sum-query))

This will break due to loop detection, when in fact it is a finite execution. While I won't expect to find this exact pattern in rules that we(cerner) have, I wouldn't be surprised if someone was doing something similar or using unconditional inserts to break loops.

On one hand, i would think that the addition of this would maintain the previous functionality of the rules engine by default(loop forever, or till done), but on the other I understand that someone just starting clara might not be aware of the functionality thus miss out on the detection option.

Perhaps a large default is correct, but also allowing a sentinel value to disable loop detection to maintain passivity.

Copy link
Collaborator Author

@WilliamParker WilliamParker Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was that it just didn't seem likely to hit it, but I see your points that it is always going to be possible and there is enough usage of Clara for odd use cases like this to exist.

How would we feel about something like the following?

  • Having the limit be configurable, as is. We could revisit the limit as well.
  • Making the behaviour on reaching the limit configurable, with the options being to throw an exception, log a warning, or do nothing, with the log option being the default. If a warning just went to standard out that would probably take care of most of the confusion on the part of users who misunderstand looping behaviour in Clara, while being less likely to cause problems for users than simply throwing an exception. That said, the problems wouldn't go down to zero since if it bubbled up to the end-user (not a developer) in an app of some kind it could be confusing, as well as if Clara was in an app that had specific requirements around printing to standard out.
  • In the future possibly adding a comprehensive "development mode" that would do things like enable throwing here, enable tracing, etc. This would be an easier thing to explain in the docs than lots of different options requiring an understanding of Clara to make good choices.

If we go down the logging path, tools.logging might be an option. I'm not delighted about adding dependencies, but it looks pretty lightweight and has no dependencies itself other than Clojure. In essence it inspects the environment at runtime and uses whatever JVM logging framework is available, falling back to java.util.logging if nothing else is available. In ClojureScript I think we'd probably need to just print to the console, but I don't think the JS console is exposed to website users who aren't using developer tools so perhaps that isn't a major issue.

Sorry to take so long to respond here... I've been a bit preoccupied lately.

Thoughts @EthanEChristian @mrrodriguez ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that approach seems reasonable to me, particularly:

Making the behavior on reaching the limit configurable

I feel like this allows us to have this functionality and still maintain passivity for consumers that have "odd" rules.

I don't think I have any objections to the addition of logging to clara's dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamParker I like the default to be logging, however I don't think this should require a new dependency.

I think it may actually be best to expose 2 args for this feature:

max-cycles - what you already have
max-cycles-exceeded-fn - tentative name, but it gives the consumer the ability to decide what to do

If they do not provide anything, perhaps we do a "print once" default fn. We can also provide a fn that can be used to throw the exception if that is what someone wants. These can be exposed as plain fn's or "special" keywords the argument accepts.

:max-cycles 60000
:max-cycles-exceeded-fn r/max-cycles-exceeded-print-once
;; Alternatives:
;; Special kw :clara-rules/max-cycles-exceeded-print-once
;; r/max-cycles-exceeded-throw

If no :max-cycles-exceeded-fn is given, use a default.

If we go this path, we should consider what args to this fn would be useful.

I think giving a handler to the consumer for cases like this is a common clj pattern.

@WilliamParker @EthanEChristian

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per an offline conversation with @mrrodriguez, for my use-case the addition of the dependency wouldn't be an issue. After that conversation, I think that i can agree with Mike's hesitance on clara mandating a logger on it's consumers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for summarizing @EthanEChristian

I'll re-iterate my concern is that I'm not a particular fan of adding dependencies in general in a general-use library - unless they are providing enough value to justify. Logging in a single case doesn't seem too compelling in that area.

Also, I know that org.clojure/tools.logging is quite flexible (I use it fairly often), but it can still lead to issues when added to any given classpath. It does do some implicit logger impl searching/loading and I think it could get particularly awkward if someone was using a logger framework that doesn't fit into org.clojure/tools.logging's pattern at all. It may end up logging to it's own logger instead of their chosen one/may be annoying to try to silence or integrate with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it sounds like there is a consensus for a default behaviour of printing to standard out then. I'll proceed with that.

[]
t)]
(into []
(comp (map ex-data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just use (keep ex-data) here and remove the comp and filter line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to filter out empty but non-nil exception data too, but I agree that using keep can simplify this a bit.

@@ -35,6 +35,12 @@

This take an additional map of options as a second argument. Current options:

:max-cycles (positive integer): Each time Clara determines that there are no more insertions process in a given activation group (rules with the same salience)
it records that one cycle of processing rules has been performed. When the max-cycles limit is reached Clara throws an exception indicating that the rules
are in an infinite loop. This exception can be regarded as analogous to a StackOverflowError. The default is 600000, which is a level at which it is unlikely that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really is indicating that the rules may be in an infinite loop. I think that it'd be better stated that way.

[]
(l/get-children p-listener)))
(catch #?(:clj Exception :cljs :default)
listener-exception
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was weird to see the first listener-exception on the next line. It just makes it blend with the body.
I don't consider it to be a critical point though if others thing it is fine how it is. I know reader-conditionals can make things get a bit weird.

@@ -1817,13 +1835,13 @@
(when (some-> node :production :props :no-loop)
(flush-updates *current-session*)))))

(recur (mem/next-activation-group transient-memory) next-group)))
(recur (mem/next-activation-group transient-memory) next-group flushed-updates-count)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that explanation makes sense. However, you may want to note that in a comment above this since I think it could be hard to understand looking at it later.

@@ -1867,7 +1885,8 @@
(fire-rules [session opts]

(let [transient-memory (mem/to-transient memory)
transient-listener (l/to-transient listener)]
transient-listener (l/to-transient listener)
max-cycles (get opts :max-cycles 600000)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamParker I like the default to be logging, however I don't think this should require a new dependency.

I think it may actually be best to expose 2 args for this feature:

max-cycles - what you already have
max-cycles-exceeded-fn - tentative name, but it gives the consumer the ability to decide what to do

If they do not provide anything, perhaps we do a "print once" default fn. We can also provide a fn that can be used to throw the exception if that is what someone wants. These can be exposed as plain fn's or "special" keywords the argument accepts.

:max-cycles 60000
:max-cycles-exceeded-fn r/max-cycles-exceeded-print-once
;; Alternatives:
;; Special kw :clara-rules/max-cycles-exceeded-print-once
;; r/max-cycles-exceeded-throw

If no :max-cycles-exceeded-fn is given, use a default.

If we go this path, we should consider what args to this fn would be useful.

I think giving a handler to the consumer for cases like this is a common clj pattern.

@WilliamParker @EthanEChristian

@WilliamParker
Copy link
Collaborator Author

Travis seems to be confused because I pushed the branch to cerner/clara-rules first, realized my mistake, then pushed it to my fork instead, but failed to build the branch that I deleted. A build on the PR did succeed though.

@WilliamParker
Copy link
Collaborator Author

This has been superseded by a listener-based approach in #447

Closing without merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants