Normalization and Cost #310

horsto · 2023-10-11T19:38:55Z

Hi, this is a question, not an issue.
I have a bunch of features that I track over time. I am feeding them into

algo = rpt.Pelt(model=model, min_size=1, jump=1)
algo.fit(signal)
result = algo.predict(pen=p) # RESULT OF CHANGE POINT DETECTION

signal here is (for example) a 500x16 (timepoints x features). The features themselves live on pretty different scales, such that I thought that some kind of scaling / normalization (for example via https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html#sklearn.preprocessing.scale) could make sense. Now I wonder though how different costs would be affected by that. In the example I am attaching below you can see the normalized signal for L1 and L2 norms -> change points are depicted with dashed lines. You can see that there are some obvious misses there (calibrating the penalty helps sometimes, but is a finicky process).
Should normalization be skipped altogether / is there a better alternative cost for these kind of signals?

The text was updated successfully, but these errors were encountered:

tg12 · 2023-10-19T20:51:49Z

What are you using to draw these graphs as an unrelated question!

Should normalization be skipped altogether / is there a better alternative cost for these kind of signals?

I do agree in some instances there might be a need to remove any pre processing of the data, this can be done upstream if needed unless it's an inherent part of the pelt algorithm.

horsto · 2023-10-20T13:01:42Z

It's not inherent to the pelt algorithm I think? Unless there is some hidden pre processing going on (?).

I would like to know whether I should do my own normalization up front, and how it might affect certain cost functions in the pelt algorithm (L1, L2, ...).

The plotting is just matplotlib + seaborn!

deepcharles · 2023-10-24T07:40:43Z

Hi,

Sorry for the late reply.

To normalize or not is task-dependant and there is no definite answer. For multivariate signals, PELT will detect the largest shifts, i.e., those with a large norm ||m_before - m_after|| where m_before and m_after are the multivariate averages just before and after the change.
As an example, consider the following 2D signal.

One dimension has large shifts and the other has small shifts. Without normalization, only changes in the large dimension are detected.

rpt.display(s, [], rpt.Pelt().fit(s).predict(pen=50))

After normalization, all changes are detected.

rpt.display(s, [], rpt.Pelt().fit(s).predict(pen=50))

Hope this helps

Repository owner locked and limited conversation to collaborators Oct 24, 2023

deepcharles converted this issue into discussion #312 Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Normalization and Cost #310

Normalization and Cost #310

horsto commented Oct 11, 2023 •

edited

Loading

tg12 commented Oct 19, 2023 •

edited

Loading

horsto commented Oct 20, 2023

deepcharles commented Oct 24, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Normalization and Cost #310

Normalization and Cost #310

Comments

horsto commented Oct 11, 2023 • edited Loading

tg12 commented Oct 19, 2023 • edited Loading

horsto commented Oct 20, 2023

deepcharles commented Oct 24, 2023

This issue was moved to a discussion.

horsto commented Oct 11, 2023 •

edited

Loading

tg12 commented Oct 19, 2023 •

edited

Loading