A machine learning pipeline for detecting fine-scale behavioral events in bio-logging data.
Install with pip.
pip install stickleback
- Behavioral events are brief behaviors that can be represented as a point in time, e.g. feeding or social interactions.
- High-resolution bio-logging data (e.g. from accelerometers and magnetometers) are multi-variate time series. Traditional classifiers struggle with time series data.
stickleback
takes a time series classification approach to detect behavioral events in longitudinal bio-logging data.
The included sensor data contains the depth, pitch, roll, and speed of six blue whales at 10 Hz, and the event data contains the times of lunge-feeding behaviors.
import pandas as pd
import sktime.classification.interval_based
import sktime.classification.compose
from stickleback.stickleback import Stickleback
import stickleback.data
import stickleback.util
import stickleback.visualize
# Load sample data
sensors, events = stickleback.data.load_lunges()
# Split into test and train (3 deployments each)
def split_dict(d, ks):
dict1 = {k: v for k, v in d.items() if k in ks}
dict2 = {k: v for k, v in d.items() if k not in ks}
return dict1, dict2
test_deployids = list(sensors.keys())[0:2]
sensors_test, sensors_train = split_dict(sensors, test_deployids)
events_test, events_train = split_dict(events, test_deployids)
sensors[test_deployids[0]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
depth | pitch | roll | speed | |
---|---|---|---|---|
datetime | ||||
2018-09-05 11:55:52.400 | 14.911083 | -0.059933 | -0.012899 | 4.274450 |
2018-09-05 11:55:52.500 | 14.910864 | -0.067072 | -0.010815 | 4.044154 |
2018-09-05 11:55:52.600 | 14.915853 | -0.075173 | -0.008335 | 3.820568 |
2018-09-05 11:55:52.700 | 14.923190 | -0.085225 | -0.005727 | 3.602702 |
2018-09-05 11:55:52.800 | 14.928955 | -0.096173 | -0.002803 | 3.432342 |
... | ... | ... | ... | ... |
2018-09-05 13:55:51.900 | 22.552306 | -0.010861 | 0.005441 | 2.246061 |
2018-09-05 13:55:52.000 | 22.571625 | -0.010534 | 0.004674 | 2.257525 |
2018-09-05 13:55:52.100 | 22.588129 | -0.010081 | 0.003841 | 2.267966 |
2018-09-05 13:55:52.200 | 22.603341 | -0.009627 | 0.003042 | 2.272327 |
2018-09-05 13:55:52.300 | 22.619537 | -0.009355 | 0.002164 | 2.277328 |
72000 rows × 4 columns
plot_sensors_events()
produces an interactive figure for exploring bio-logger data.
# Choose one deployment to visualize
deployid = list(sensors.keys())[0]
stickleback.visualize.plot_sensors_events(deployid, sensors, events)
Initialize a Stickleback
model using Supervised Time Series Forests and a 5 s window.
# Supervised Time Series Forests ensembled across the columns of `sensors`
cols = sensors[list(sensors.keys())[0]].columns
tsc = sktime.classification.interval_based.SupervisedTimeSeriesForest(n_estimators=2,
random_state=4321)
stsf = sktime.classification.compose.ColumnEnsembleClassifier(
estimators = [('STSF_{}'.format(col),
tsc,
[i])
for i, col in enumerate(cols)]
)
sb = Stickleback(
local_clf=stsf,
win_size=50,
tol=pd.Timedelta("5s"),
nth=10,
n_folds=4,
seed=1234
)
Fit the Stickleback
object to the training data.
sb.fit(sensors_train, events_train)
Use the fitted Stickleback
model to predict occurence of lunge-feeding events in the test dataset.
predictions = sb.predict(sensors_test)
Use the temporal tolerance (in this example, 5 s) to assess model predictions. Visualize with an outcome table and an interactive visualization. In the figure, blue = true positive, hollow red = false negative, and solid red = false positive.
outcomes = sb.assess(predictions, events_test)
stickleback.visualize.outcome_table(outcomes, sensors_test)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
F1 | TP | FP | FN | Duration (hours) | |
---|---|---|---|---|---|
deployid | |||||
bw180905-49 | 1.000000 | 44 | 0 | 0 | 1.999972 |
bw180905-53 | 0.943396 | 25 | 2 | 1 | 1.999972 |
deployid = list(events_test.keys())[0]
stickleback.visualize.plot_predictions(deployid,
sensors_test,
predictions,
outcomes)