In this work we employ Bayesian surprise to detect interesting/anomalous patterns from discrete sequence data. Many domains consist of discrete sequential time-series such as DNA analysis, online transactions, web click-stream navigation, cyber-attacks, financial transactions and especially sociology life-course data. The difficulty is that each data set has its own unique characteristics and many anomalies defy categorization. Since anomalies are by nature infrequent and elusive, we often do not have enough data for a supervised approach. However, novelty and surprise play a fundamental role in human and animal behavior for survival, attention and adaptation. We use regular expressions to collect the longest repeating sequences and define these as motifs (which may or may not represent novel patterns). The sequences are now composed of simpler motifs which are used to build Probabilistic Suffix Trees (PST) which can capture complex relationships based on motif location and frequency of occurrence. New data that deviates from established motifs either in location of appearance, frequency of appearance, or motif composition may represent recurring patterns that may be different in some way. Bayesian surprise is the result of mismatches between our expectations and actual results, hence the degree of surprise or anomalousness attached to a pattern will vary with respect to these differences. The implication of obtaining large surprise values identifies those patterns likely to be useful and interesting to the user.
-
Notifications
You must be signed in to change notification settings - Fork 0
Bayesian surprise is the result of mismatches between our expectations and actual results, hence the degree of surprise or anomalousness attached to a pattern will vary with respect to these differences. The implication of obtaining large surprise values identifies those patterns likely to be useful and interesting to the user.
kenmcgarry/BayesSurprise
About
Bayesian surprise is the result of mismatches between our expectations and actual results, hence the degree of surprise or anomalousness attached to a pattern will vary with respect to these differences. The implication of obtaining large surprise values identifies those patterns likely to be useful and interesting to the user.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published