This package provides a Monte Carlo Decision Tree Search1 to achieve an optimal time-delay embedding of time series. The embedding of a set of
is chosen such that the delays
What an optimal embedding is does in general depended on the application that the practitoner has in mind. TreeEmbedding.jl
allows its users to freely chose the objective function
mcdts_embedding
TreeEmbedding.jl
provides a default setting that which can be considered a good "allrounder" embedding. This embedding is a refinement of the PECUZAL algorithm2, which uses the uzal_cost_pecuzal_mcdts
, uzal_cost
) as the objective function pecora
) as the delay pre-selection statistic
As a first example, let us consider a Lorenz63 system. This is meant as a toy example: we generate data from a Lorenz63 system and then try to reconstruct the full state space from two of the three observables.
First we import the needed modules and generate a trajectory of the Lorenz system (and discard any transient dynamics)
using DynamicalSystems, TreeEmbedding, Random
# Check Lorenz System
Random.seed!(1234)
ds = Systems.lorenz()
data = trajectory(ds,50, Ttr=100.)[:,1:2]
Then, we simply call the default embedding option with
tree = mcdts_embedding(data, 50)
This decision tree search is computionally relatively expensive, you should expect this to take a few minutes. If the keyword 'verbose=true` is used, the current state of the optimization is printed after each sample. After completion it will directly return the best embedding it found:
Embedding tree with current best embedding: L=-1.0529700545830651 - full embd. τ=[0, 8, 0] ,i_ts=[1, 2, 2]
Here L
denotes the chosen objective/loss-function (L-statistic in this case),
!!! note "Reproducibility of results"
Since the decision tree gets randomly sampled the shown results might differ each time you run the code. For reproducible results a random seed must be set, e.g.
```julia
Random.seed!(1234)
```
TreeEmbeddding.jl
provides much more than just a refinement to the PECUZAL algorithm. It allows to find optimal embedding with different objective functions for a wide variety of applications. Configuring the embedding, breaks down in three parts.
The objective function
So far, TreeEmbedding.jl
predefines four different objective functions:
- The
$L$ statistic from Uzal et al.3:L_statistic
- The False Nearest Neighbor (FNN) statistic based on Hegger & Kantz5:
FNN_statistic
- A loss function based on the correlation coefficient of the
convergent cross mapping, from Sugihara et al.6:
CCM_ρ
- A loss based on a prediction performed with the current reconstruction, see
Prediction_error
L_statistic
FNN_statistic
CCM_ρ
Prediction_error
All of these can be directly initialized with their default parameters e.g. by L_statistic()
, but also further adjusted. For that please see the reference of the individual objective functions. Further objective functions can be defined as subtypes of AbstractLoss
. Most importantly they must have a method compute_loss
attached.
AbstractLoss
compute_loss
A statistic pre selecting which delays
So far TreeEmbedding.jl
predefines two delay preselection statistics:
- The continuity function
⟨ε★⟩
by Pecora et al.4:Continuity_function
,pecora
- A given range of delays to consider, without a proper preselection:
Range_function
Continuity_function
Range_function
Further, one can modify how the decision tree is sampled and searched.
As we strive to find a global minimum of the objective function
-
Expand: Starting from the root, for each embedding cycle
$D_d$ , possible next steps$(s_{i_j},\tau_j,\Gamma_j)$ are either computed using suitable statistics$\Lambda_{\tau}$ and$\Gamma$ or, if there were already previously computed ones, they are looked up from the tree. We consider the first embedding cycle$D_2$ and use the continuity statistic$\langle \varepsilon^\star \rangle(\tau)$ for$\Lambda_{\tau}$ . Then, for each time series$s_{i}$ the corresponding local maxima of all$\langle \varepsilon^\star \rangle(\tau)$ that determines the set of possible delay values$\tau_2$ corresponding to$D_2$ ). Then, one of the possible$\tau_2$ 's is randomly chosen with probabilities computed with a softmax of the corresponding values of$\Gamma_j$ . Due to its normalization, the softmax function is able to convert all possible values of$\Gamma_j$ to probabilities with$p_j=\exp(-\beta \Gamma_j)/\sum_k\exp(-\beta \Gamma_k)$ . This procedure is repeated until the very last computed embedding cycle$D_{m+1}$ . This is, when the objective function$\Gamma_{m+1}$ cannot be further decreased for any of the$\tau_{m+1}$ -candidates. -
Backpropagation: After the tree is expanded, the final value
$\Gamma_m$ is backpropagated through the taken path of this trial, i.e., to all leafs (previous embedding cycles$d$ ), that were visited during this expand, updating their$\Gamma_d$ values to that of the final embedding cycle.
We can modify this tree search in TreeEmbedding.jl
be setting different values for
The choose function can be adjusted by supplying the keyword argument choose_func
to mcdts_embedding
. The default for that is (L)->(TreeEmbedding.softmaxL(L,β=2.))
. Additionally, we can specifiy a maximum depth for the tree search with the max_depth
keyword.
We can put these three configuration steps together to get our personal TreeEmbedding by first specifying the MCDTSOptimGoal
that holds both the objective function and the delay preselection statistic.
MCDTSOptimGoal
This is the key constructor for using mcdts_embedding
. For example, when we want to use the PECUZAL idea, we combine the
optimgoal = MCDTSOptimGoal(L_statistic(),Continuity_function())
There are some predefined optimziation goals:
- The above mentioned PECUZAL refinement can also be called directly as
optimgoal = PecuzalOptim()
- FFN statistic + continuity function
optimgoal = FNNOptim()
- CCM-causality analysis + range function
optimgoal = CCMOptim()
- A zeroth order predictor and the continuity statistic
optimgoal = PredictOptim()
Next, we also have to specify the Theiler window (neighbors in time with index w
close to the point, that are excluded from being true neighbors), e.g. with mutual information minimum method of DelayEmbeddings.jl
by
w1 = DelayEmbeddings.estimate_delay(data[:,1],"mi_min")
w2 = DelayEmbeddings.estimate_delay(data[:,2],"mi_min")
w = maximum(hcat(w1,w2))
Additionally, we set what delays are considered, e.g. all delays up to 100 and how many samples we want to compute during the tree search
delays = 0:100
N = 50
Then, we can finally compute our embedding:
tree = mcdts_embedding(data, optimgoal, w, delays, N)
Footnotes
-
Kraemer, K.H., Gelbrecht, M., Pavithran, I., Sujith, R. I. and Marwan, N. (2022). Optimal state space reconstruction via Monte Carlo decision tree search. Nonlinear Dynamics. ↩
-
Kraemer, K.H., Datseris, G., Kurths, J., Kiss, I.Z., Ocampo-Espindola, Marwan, N. (2021). A unified and automated approach to attractor reconstruction. New Journal of Physics 23(3), 033017. ↩
-
Uzal, L. C., Grinblat, G. L., Verdes, P. F. (2011). Optimal reconstruction of dynamical systems: A noise amplification approach. Physical Review E 84, 016223. ↩ ↩2
-
Pecora, L. M., Moniz, L., Nichols, J., & Carroll, T. L. (2007). A unified approach to attractor reconstruction. Chaos 17(1). ↩ ↩2
-
Hegger & Kantz, Improved false nearest neighbor method to detect determinism in time series data. Physical Review E 60, 4970. ↩
-
Sugihara et al., Detecting Causality in Complex Ecosystems. Science 338, 6106, 496-500. ↩