Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 275: Thow exception on infinite loops #425

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/main/clojure/clara/rules.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@

This take an additional map of options as a second argument. Current options:

:max-cycles (positive integer): Each time Clara determines that there are no more insertions process in a given activation group (rules with the same salience)
it records that one cycle of processing rules has been performed. When the max-cycles limit is reached Clara throws an exception indicating that the rules
are in an infinite loop. This exception can be regarded as analogous to a StackOverflowError. The default is 600000, which is a level at which it is unlikely that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really is indicating that the rules may be in an infinite loop. I think that it'd be better stated that way.

most users will encounter this error in cases where the rules actually would eventually terminate. See issue 275 for details on how this level was chosen.
Nevertheless, it is possible that such cases exist, and so an option is provided to increase the limit. Furthermore, some infinite loop scenarios will cause memory
errors before this ceiling is reached, and so users who do not expect to need a large number of rule cycles may wish to lower this limit.
:cancelling true (EXPERIMENTAL, subject to change/removal. Not supported in ClojureScript.):
Simultaneously propagate insertions and retractions through the rules network, at every step using the insertion and retractions of equals facts to cancel each
other out and avoid operations deeper in the rules network. The behavior of unconditional insertions and RHS (right-hand side) retractions
Expand Down Expand Up @@ -419,4 +425,4 @@ See the [query authoring documentation](http://www.clara-rules.org/docs/queries/
(map first) ; Take the symbols for the rule/query vars
)]
(doseq [psym production-syms]
(ns-unmap *ns* psym))))))
(ns-unmap *ns* psym))))))
44 changes: 36 additions & 8 deletions src/main/clojure/clara/rules/engine.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -1719,7 +1719,7 @@

(defn fire-rules*
"Fire rules for the given nodes."
[rulebase nodes transient-memory transport listener get-alphas-fn update-cache]
[rulebase nodes transient-memory transport listener get-alphas-fn update-cache max-cycles]
(binding [*current-session* {:rulebase rulebase
:transient-memory transient-memory
:transport transport
Expand All @@ -1729,7 +1729,25 @@
:listener listener}]

(loop [next-group (mem/next-activation-group transient-memory)
last-group nil]
last-group nil
group-change-count 0]

(when (> group-change-count max-cycles)
(throw (ex-info
(str "It appears that the rules are in an infinite loop."
" Incorrect use of truth maintenance is a frequent source of such loops; see the website."
" If the rules are not in fact in a loop it is possible to increase the ceiling by use of the :max-cycles setting."
" See the clara.rules/fire-rules docstring for details.")
{:clara-rules/infinite-loop-suspected true
:listeners (try
(let [p-listener (l/to-persistent! listener)]
(if (l/null-listener? p-listener)
[]
(l/get-children p-listener)))
(catch #?(:clj Exception :cljs :default)
listener-exception
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation makes this confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you indent it? This looks fairly normal to me but open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(catch #?(:clj Exception :cljs :default)  listener-exception
     listener-exception)

is probably how i would write it, but that just because i am used to seeing:

(try 
  <some logic>
  (catch Exception ex
    <do something>))

though the reader conditional makes this a bit different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was weird to see the first listener-exception on the next line. It just makes it blend with the body.
I don't consider it to be a critical point though if others thing it is fine how it is. I know reader-conditionals can make things get a bit weird.

listener-exception))})))


(if next-group

Expand All @@ -1739,7 +1757,7 @@
;; group before continuing.
(do
(flush-updates *current-session*)
(recur (mem/next-activation-group transient-memory) next-group))
(recur (mem/next-activation-group transient-memory) next-group (inc group-change-count)))

(do

Expand Down Expand Up @@ -1817,13 +1835,20 @@
(when (some-> node :production :props :no-loop)
(flush-updates *current-session*)))))

(recur (mem/next-activation-group transient-memory) next-group)))
;; If the activation group changed, then the updates will be flushed after the loop statement
;; and we will recur again, this time with the group-change-count incremented. It is expected
;; that changes between activation groups will be less numerous than each single activation
;; of a rule, and the former can remain constant even for large numbers of facts. That is,
;; it is more dependent on the logical structure of fact flow than the simple volume.
;; Therefore counting only upon activation groups changes will reduce the likelihood of a false
;; positive in infinite cycle detection.
(recur (mem/next-activation-group transient-memory) next-group group-change-count)))

;; There were no items to be activated, so flush any pending
;; updates and recur with a potential new activation group
;; since a flushed item may have triggered one.
(when (flush-updates *current-session*)
(recur (mem/next-activation-group transient-memory) next-group))))))
(recur (mem/next-activation-group transient-memory) next-group (inc group-change-count)))))))

(deftype LocalSession [rulebase memory transport listener get-alphas-fn pending-operations]
ISession
Expand Down Expand Up @@ -1867,7 +1892,8 @@
(fire-rules [session opts]

(let [transient-memory (mem/to-transient memory)
transient-listener (l/to-transient listener)]
transient-listener (l/to-transient listener)
max-cycles (get opts :max-cycles 600000)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on the default chosen here. It could be breaking.

I haven't thought it completely through, but is the counting no specifically counting cycles of rules happening, but instead just steps happening during fire-rules? Perhaps that is not granular enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count is intended to be incremented each time an activation group is fully fired. I'm expecting that to be much smaller than the number of times a rule is fired, and so safer in terms of avoiding false positives and incorrectly throwing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to post more details later on how I chose the count and maybe change it - I measured the count for 30 seconds of a simple case on my device machine. Open to discussion on what the count should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Mike here:

I'm not sure on the default chosen here. It could be breaking.

Given the fabricated set of rules

(ns clara.loop-detection
  (:require [clojure.test :refer :all]
            [clara.rules :as r]
            [clara.rules.accumulators :as acc]))

(defrecord Total [x])

(defrecord Subs [y])
(defrecord Sum [z])

(r/defrule sum-rule
  [Total (= ?total x)]
  [?s <- (acc/sum :y) :from [Subs]]
  [:test (> ?total ?s)]
  =>
  (r/insert! (->Sum (- ?total ?s))))

(r/defrule subs-rule
  [Sum]
  =>
  (r/insert-unconditional! (->Subs 1)))

(r/defquery sum-query
  []
  [?s <- (acc/count) :from [Subs]])

(-> (r/mk-session)
    (r/insert (->Total 600000))
    (r/fire-rules)
    (r/query sum-query))

This will break due to loop detection, when in fact it is a finite execution. While I won't expect to find this exact pattern in rules that we(cerner) have, I wouldn't be surprised if someone was doing something similar or using unconditional inserts to break loops.

On one hand, i would think that the addition of this would maintain the previous functionality of the rules engine by default(loop forever, or till done), but on the other I understand that someone just starting clara might not be aware of the functionality thus miss out on the detection option.

Perhaps a large default is correct, but also allowing a sentinel value to disable loop detection to maintain passivity.

Copy link
Collaborator Author

@WilliamParker WilliamParker Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was that it just didn't seem likely to hit it, but I see your points that it is always going to be possible and there is enough usage of Clara for odd use cases like this to exist.

How would we feel about something like the following?

  • Having the limit be configurable, as is. We could revisit the limit as well.
  • Making the behaviour on reaching the limit configurable, with the options being to throw an exception, log a warning, or do nothing, with the log option being the default. If a warning just went to standard out that would probably take care of most of the confusion on the part of users who misunderstand looping behaviour in Clara, while being less likely to cause problems for users than simply throwing an exception. That said, the problems wouldn't go down to zero since if it bubbled up to the end-user (not a developer) in an app of some kind it could be confusing, as well as if Clara was in an app that had specific requirements around printing to standard out.
  • In the future possibly adding a comprehensive "development mode" that would do things like enable throwing here, enable tracing, etc. This would be an easier thing to explain in the docs than lots of different options requiring an understanding of Clara to make good choices.

If we go down the logging path, tools.logging might be an option. I'm not delighted about adding dependencies, but it looks pretty lightweight and has no dependencies itself other than Clojure. In essence it inspects the environment at runtime and uses whatever JVM logging framework is available, falling back to java.util.logging if nothing else is available. In ClojureScript I think we'd probably need to just print to the console, but I don't think the JS console is exposed to website users who aren't using developer tools so perhaps that isn't a major issue.

Sorry to take so long to respond here... I've been a bit preoccupied lately.

Thoughts @EthanEChristian @mrrodriguez ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that approach seems reasonable to me, particularly:

Making the behavior on reaching the limit configurable

I feel like this allows us to have this functionality and still maintain passivity for consumers that have "odd" rules.

I don't think I have any objections to the addition of logging to clara's dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamParker I like the default to be logging, however I don't think this should require a new dependency.

I think it may actually be best to expose 2 args for this feature:

max-cycles - what you already have
max-cycles-exceeded-fn - tentative name, but it gives the consumer the ability to decide what to do

If they do not provide anything, perhaps we do a "print once" default fn. We can also provide a fn that can be used to throw the exception if that is what someone wants. These can be exposed as plain fn's or "special" keywords the argument accepts.

:max-cycles 60000
:max-cycles-exceeded-fn r/max-cycles-exceeded-print-once
;; Alternatives:
;; Special kw :clara-rules/max-cycles-exceeded-print-once
;; r/max-cycles-exceeded-throw

If no :max-cycles-exceeded-fn is given, use a default.

If we go this path, we should consider what args to this fn would be useful.

I think giving a handler to the consumer for cases like this is a common clj pattern.

@WilliamParker @EthanEChristian

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per an offline conversation with @mrrodriguez, for my use-case the addition of the dependency wouldn't be an issue. After that conversation, I think that i can agree with Mike's hesitance on clara mandating a logger on it's consumers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for summarizing @EthanEChristian

I'll re-iterate my concern is that I'm not a particular fan of adding dependencies in general in a general-use library - unless they are providing enough value to justify. Logging in a single case doesn't seem too compelling in that area.

Also, I know that org.clojure/tools.logging is quite flexible (I use it fairly often), but it can still lead to issues when added to any given classpath. It does do some implicit logger impl searching/loading and I think it could get particularly awkward if someone was using a logger framework that doesn't fit into org.clojure/tools.logging's pattern at all. It may end up logging to it's own logger instead of their chosen one/may be annoying to try to silence or integrate with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it sounds like there is a consensus for a default behaviour of printing to standard out then. I'll proceed with that.


(if-not (:cancelling opts)
;; We originally performed insertions and retractions immediately after the insert and retract calls,
Expand Down Expand Up @@ -1912,7 +1938,8 @@
transport
transient-listener
get-alphas-fn
(uc/get-ordered-update-cache)))
(uc/get-ordered-update-cache)
max-cycles))

#?(:cljs (throw (ex-info "The :cancelling option is not supported in ClojureScript"
{:session session :opts opts}))
Expand Down Expand Up @@ -1961,7 +1988,8 @@
get-alphas-fn
;; This continues to use the cancelling cache after the first batch of insertions and retractions.
;; If this is suboptimal for some workflows we can revisit this.
update-cache)))))
update-cache
max-cycles)))))

(LocalSession. rulebase
(mem/to-persistent! transient-memory)
Expand Down
17 changes: 17 additions & 0 deletions src/main/clojure/clara/tools/testing_utils.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,20 @@
(str "Actual mean value: " mean))
{:mean mean
:std std}))

#?(:clj
(defn ex-data-maps
"Given a Throwable, return in order the ExceptionInfo data maps for all items in the
chain that implement IExceptionInfo and have nonempty data maps."
[t]
(let [throwables ((fn append-self
[prior t1]
(if t1
(append-self (conj prior t1) (.getCause ^Throwable t1))
prior))
[]
t)]
(into []
(comp (map ex-data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just use (keep ex-data) here and remove the comp and filter line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to filter out empty but non-nil exception data too, but I agree that using keep can simplify this a bit.

(filter not-empty))
throwables))))
173 changes: 173 additions & 0 deletions src/test/clojure/clara/test_infinite_loops.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
(ns clara.test-infinite-loops
(:require [clojure.test :refer :all]
[clara.rules :refer :all]
[clara.rules.testfacts :refer [->Cold ->Hot ->First ->Second]]
[clara.tools.testing-utils :refer [def-rules-test
ex-data-maps
side-effect-holder-fixture
side-effect-holder]]
[clara.test-rules :refer [assert-ex-data]]
[clara.tools.tracing :as tr]
[clara.rules.accumulators :as acc])
(:import [clara.rules.testfacts Cold Hot First Second]
[clara.tools.tracing
PersistentTracingListener]))

(use-fixtures :each side-effect-holder-fixture)

(def-rules-test test-truth-maintenance-loop

;; Test of an infinite loop to an endless cycle caused by truth maintenance,
;; that is an insertion that causes its support to be retracted.

{:rules [hot-rule [[[:not [Hot]]]
(insert! (->Cold nil))]

cold-rule [[[Cold]]
(insert! (->Hot nil))]]

:sessions [empty-session [hot-rule cold-rule] {}]}

(assert-ex-data {:clara-rules/infinite-loop-suspected true} (fire-rules empty-session {:max-cycles 3000})))

(def-rules-test test-truth-maintenance-loop-with-salience

;; Test of an infinite loop to an endless cycle caused by truth maintenance,
;; that is an insertion that causes its support to be retracted, when the
;; two rules that form the loop have salience and are thus parts of different
;; activation groups.


{:rules [hot-rule [[[:not [Hot]]]
(insert! (->Cold nil))
{:salience 1}]

cold-rule [[[Cold]]
(insert! (->Hot nil))
{:salience 2}]]

:sessions [empty-session [hot-rule cold-rule] {}
empty-session-negated-salience [hot-rule cold-rule] {:activation-group-fn (comp - :salience :props)}]}

(doseq [session [empty-session empty-session-negated-salience]]
;; Validate that the results are the same in either rule ordering.
(assert-ex-data {:clara-rules/infinite-loop-suspected true} (fire-rules session {:max-cycles 3000}))))


(def-rules-test test-recursive-insertion

;; Test of an infinite loop due to runaway insertions without retractions.

{:rules [cold-rule [[[Cold]]
(insert! (->Cold "ORD"))]]

:sessions [empty-session [cold-rule] {}]}

(assert-ex-data {:clara-rules/infinite-loop-suspected true}
(-> empty-session
(insert (->Cold "ARN"))
;; Use a small value here to ensure that we don't run out of memory before throwing.
;; There is unfortunately a tradeoff where making the default number of cycles allowed
;; high enough for some use cases will allow others to create OutOfMemory errors in cases
;; like these but we can at least throw when there is enough memory available to hold the facts
;; created in the loop.
(fire-rules {:max-cycles 10}))))

(def-rules-test test-recursive-insertion-loop-no-salience

{:rules [first-rule [[[First]]
(insert! (->Second))]

second-rule [[[Second]]
(insert! (->First))]]

:sessions [empty-session [first-rule second-rule] {}]}

(assert-ex-data {:clara-rules/infinite-loop-suspected true}
(-> empty-session
(insert (->First))
(fire-rules {:max-cycles 10}))))

(def-rules-test test-recursive-insertion-loop-with-salience

{:rules [first-rule [[[First]]
(insert! (->Second))
{:salience 1}]

second-rule [[[Second]]
(insert! (->First))
{:salience 2}]]

:sessions [empty-session [first-rule second-rule] {}
empty-session-negated-salience [first-rule second-rule] {:activation-group-fn (comp - :salience :props)}]}

(doseq [session [empty-session empty-session-negated-salience]]
(assert-ex-data {:clara-rules/infinite-loop-suspected true}
(-> session
(insert (->First))
(fire-rules {:max-cycles 10})))))

(def-rules-test test-tracing-infinite-loop

{:rules [cold-rule [[[Cold]]
(insert! (->Cold "ORD"))]]

:sessions [empty-session [cold-rule] {}]}

(try
(do (-> empty-session
tr/with-tracing
(insert (->Cold "ARN"))
;; Use a small value here to ensure that we don't run out of memory before throwing.
;; There is unfortunately a tradeoff where making the default number of cycles allowed
;; high enough for some use cases will allow others to create OutOfMemory errors in cases
;; like these but we can at least throw when there is enough memory available to hold the facts
;; created in the loop.
(fire-rules {:max-cycles 10}))
(is false "The infinite loop did not throw an exception."))
(catch Exception e
(let [data-maps (ex-data-maps e)
loop-data (filter :clara-rules/infinite-loop-suspected data-maps)]
(is (= (count loop-data)
1)
"There should only be one exception in the chain from infinite rules loops.")
(is (= (-> loop-data first :listeners count) 1)
"There should only be one listener.")
(is (-> loop-data first :listeners ^PersistentTracingListener first .-trace not-empty)
"There should be tracing data available when a traced session throws an exception on an infinite loop.")))))

(def-rules-test test-max-cycles-respected

;; As the name suggests, this is a test to validate that setting the max-cycles to different
;; values actually works. The others tests are focused on validating that infinite loops
;; throw exceptions, and the max-cycles values there are simply set to restrict resource usage,
;; whether CPU or memory, by the test. Here, on the other hand, we construct a rule where we control
;; how many rule cycles will be executed, and then validate that an exception is thrown when that number is
;; greater than the max-cycles and is not when it isn't. For good measure, we validate that the non-exception-throwing
;; case has queryable output to make sure that the logic was actually run, not ommitted because of a typo or similar.

{:rules [recursive-rule-with-end [[[First]]
(when (< @side-effect-holder 20)
(do
(insert-unconditional! (->First))
(swap! side-effect-holder inc)))]]

:queries [first-query [[] [[?f <- First]]]]

:sessions [empty-session [recursive-rule-with-end first-query] {}]}


(reset! side-effect-holder 0)

(assert-ex-data {:clara-rules/infinite-loop-suspected true}
(-> empty-session
(insert (->First))
(fire-rules {:max-cycles 10})))

(reset! side-effect-holder 0)

(is (= (count (-> empty-session
(insert (->First))
(fire-rules {:max-cycles 30})
(query first-query)))
21)))
9 changes: 8 additions & 1 deletion src/test/common/clara/test_testing_utils.cljc
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
#?(:clj
(ns clara.test-testing-utils
(:require [clara.tools.testing-utils :refer [def-rules-test
run-performance-test]]
run-performance-test
ex-data-maps]]
[clara.rules :as r]

[clara.rules.testfacts :refer [->Temperature ->Cold]]
Expand Down Expand Up @@ -71,3 +72,9 @@
:iterations 5
:mean-assertion (partial > 500)})
(is (= @fire-rules-counter 5)))

#?(:clj
(deftest test-ex-data-maps
(let [exception (ex-info "Top" {:level :top} (IllegalArgumentException. "" (ex-info "Bottom" {:level :bottom})))]
(is (= (ex-data-maps exception)
[{:level :top} {:level :bottom}])))))