diff --git a/CHANGELOG.md b/CHANGELOG.md index ed425ea6e..8d821d734 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -62,6 +62,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Fix `BasePipeline.forecast` when prediction intervals are estimated on history data with presence of NaNs ([#1291](https://github.com/tinkoff-ai/etna/pull/1291)) - Teach `BaseMixin.set_params` to work with nested `list` and `tuple` ([#1201](https://github.com/tinkoff-ai/etna/pull/1201)) - Fix `get_anomalies_prediction_interval` to work when segments have different start date ([#1296](https://github.com/tinkoff-ai/etna/pull/1296)) +- Fix `classification` notebook to download `FordA` dataset without error ([#1298](https://github.com/tinkoff-ai/etna/pull/1298)) ## [2.0.0] - 2023-04-11 ### Added diff --git a/examples/classification.ipynb b/examples/classification.ipynb index 9419a8d2e..6c97bd34f 100644 --- a/examples/classification.ipynb +++ b/examples/classification.ipynb @@ -5,7 +5,11 @@ "id": "cb9e5d62", "metadata": {}, "source": [ - "# Classification notebook" + "# Classification notebook\n", + "\n", + "\n", + " \n", + "" ] }, { @@ -56,36 +60,81 @@ "source": [ "### Load Dataset \n", "\n", - "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FirdA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n", + "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FordA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n", "\n", - "To load the dataset, we will use `fetch_ucr_dataset` util form [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html). " + "It is possible to load the dataset using `fetch_ucr_dataset` function from [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but let's do it manually." ] }, { "cell_type": "code", "execution_count": 2, + "id": "39bd234e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " % Total % Received % Xferd Average Speed Time Time Time Current\n", + " Dload Upload Total Spent Left Speed\n", + "100 34.6M 100 34.6M 0 0 521k 0 0:01:08 0:01:08 --:--:-- 562k 0 0 346k 0 0:01:42 0:00:09 0:01:33 388k 0 0:01:13 0:00:24 0:00:49 658k 0 0:01:08 0:00:58 0:00:10 481k\n", + "Archive: data/ford_a.zip\n", + "caution: filename not matched: -q\n" + ] + } + ], + "source": [ + "!curl \"http://www.timeseriesclassification.com/ClassificationDownloads/FordA.zip\" -o data/ford_a.zip\n", + "!unzip data/ford_a.zip -d data/ford_a -q" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "972bc4d0", + "metadata": {}, + "outputs": [], + "source": [ + "import pathlib\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 4, "id": "d5c515aa", "metadata": {}, "outputs": [], "source": [ - "from pyts.datasets.ucr import fetch_ucr_dataset\n", - "import matplotlib.pyplot as plt" + "def load_ford_a(path: pathlib.Path, dataset_name: str):\n", + " train_path = path / (dataset_name + \"_TRAIN.txt\")\n", + " test_path = path / (dataset_name + \"_TEST.txt\")\n", + " data_train = np.genfromtxt(train_path)\n", + " data_test = np.genfromtxt(test_path)\n", + "\n", + " X_train, y_train = data_train[:, 1:], data_train[:, 0]\n", + " X_test, y_test = data_test[:, 1:], data_test[:, 0]\n", + "\n", + " y_train = y_train.astype(\"int64\")\n", + " y_test = y_test.astype(\"int64\")\n", + "\n", + " return X_train, X_test, y_train, y_test" ] }, { "cell_type": "code", - "execution_count": 3, - "id": "da9ae9ad", + "execution_count": 5, + "id": "4d97eb8e", "metadata": {}, "outputs": [], "source": [ - "X_train, X_test, y_train, y_test = fetch_ucr_dataset(dataset=\"FordA\", return_X_y=True)\n", + "X_train, X_test, y_train, y_test = load_ford_a(pathlib.Path(\"data\") / \"ford_a\", \"FordA\")\n", "y_train[y_train == -1], y_test[y_test == -1] = 0, 0 # transform labels to 0,1" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 6, "id": "c6f62d48", "metadata": {}, "outputs": [ @@ -95,7 +144,7 @@ "((3601, 500), (1320, 500), (3601,), (1320,))" ] }, - "execution_count": 4, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -114,7 +163,17 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 7, + "id": "e356e8e7", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 8, "id": "60e2be7c", "metadata": {}, "outputs": [ @@ -143,15 +202,15 @@ "source": [ "### Feature extraction \n", "\n", - "Raw time series values usually are not the best features for the classifier. Series length is usually much higher than the number of samples in the dataset, this case classifiers to work poorly. There exists special technique to extract more informative features from the time series, you can find a comprehensive review of them in this [paper](https://hal.inria.fr/hal-03558165/document).\n", + "Raw time series values are usually not the best features for the classifier. The length of the series is usually much greater than the number of samples in the dataset, in which case classifiers will perform poorly. There are special techniques to extract more informative features from the time series, you can find a comprehensive review of them in this [paper](https://hal.inria.fr/hal-03558165/document).\n", "\n", - "In our library we offer two methods for feature extraction able to work with the time series of different length\n", - "1. `TSFreshFeatureExtractor` - extract features using `extract_features` method form [tsfresh](https://tsfresh.readthedocs.io/en/latest/)\n" + "In our library we offer two methods for feature extraction that can work with the time series of different lengths:\n", + "1. `TSFreshFeatureExtractor` — extract features using `extract_features` method form [tsfresh](https://tsfresh.readthedocs.io/en/latest/)." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 9, "id": "a582691c", "metadata": {}, "outputs": [], @@ -164,12 +223,12 @@ "id": "48934419", "metadata": {}, "source": [ - "Constructor expects parameters of `extract_features` method, see the full list [here](https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html?highlight=feature_extraction#tsfresh.feature_extraction.extraction.extract_features) and `fill_na_value` parameter defines the value to fill the possible NaNs in the generated features" + "Constructor expects parameters of `extract_features` method, see the full list [here](https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html?highlight=feature_extraction#tsfresh.feature_extraction.extraction.extract_features). It also has parameter `fill_na_value` that defines the value for filling the possible NaNs in the generated features." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 10, "id": "854a393a", "metadata": {}, "outputs": [], @@ -181,7 +240,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 11, "id": "a26404cb", "metadata": {}, "outputs": [], @@ -194,20 +253,19 @@ "id": "1341d8d4", "metadata": {}, "source": [ - "2. `WEASELFeatureExtractor` -- extract features using the WEASEL algorithm, see the original [paper](https://arxiv.org/pdf/1701.07681.pdf)\n", + "2. `WEASELFeatureExtractor` — extract features using the WEASEL algorithm, see the original [paper](https://arxiv.org/pdf/1701.07681.pdf).\n", "\n", "This method has a long list of parameters, the most important of them are: \n", - "- **padding_value** -- value to pad the series on test set to fit the shortest series in train set\n", - "- **word_size**, **n_bins** -- word size and the alphabet size to approximate the series(strongly influence on the performance)\n", - "- **window_sizes** -- sizes of the sliding windows\n", - "- **window_steps** -- steps of the windows\n", - "- **chi2_threshold** -- feature selection threshold(the greter, the fewer features are selected)\n", - " " + "- **padding_value** — value to pad the series on test set to fit the shortest series in train set\n", + "- **word_size**, **n_bins** — word size and the alphabet size to approximate the series (strongly influence on the performance)\n", + "- **window_sizes** — sizes of the sliding windows\n", + "- **window_steps** — steps of the windows\n", + "- **chi2_threshold** — feature selection threshold (the greter, the fewer features are selected)" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 12, "id": "39de5856", "metadata": {}, "outputs": [], @@ -217,7 +275,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 13, "id": "efac0a3f", "metadata": {}, "outputs": [], @@ -244,7 +302,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 14, "id": "9d9cb6a8", "metadata": {}, "outputs": [], @@ -263,7 +321,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 15, "id": "473ce6ae", "metadata": {}, "outputs": [], @@ -282,18 +340,17 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 16, "id": "3e58c3ee", "metadata": {}, "outputs": [], "source": [ - "from sklearn.model_selection import KFold\n", - "import numpy as np" + "from sklearn.model_selection import KFold" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 17, "id": "bea29ea8", "metadata": {}, "outputs": [], @@ -313,7 +370,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 18, "id": "825794c8", "metadata": {}, "outputs": [ @@ -321,16 +378,16 @@ "name": "stderr", "output_type": "stream", "text": [ - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [00:00<00:00, 2913.77it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 721/721 [00:00<00:00, 3044.54it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3282.90it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3163.29it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3156.11it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3177.47it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3074.36it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2563.62it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 2949.87it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2967.27it/s]\n" + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [00:00<00:00, 3198.77it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 721/721 [00:00<00:00, 3384.68it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3294.64it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3469.26it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3324.07it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3412.13it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 2980.69it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2783.82it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3105.03it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2932.66it/s]\n" ] } ], @@ -348,7 +405,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 19, "id": "1ea88a41", "metadata": {}, "outputs": [ @@ -377,7 +434,7 @@ " 0.5629105765287568]}" ] }, - "execution_count": 17, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -388,7 +445,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 20, "id": "211d5c5d", "metadata": {}, "outputs": [ @@ -401,7 +458,7 @@ " 'AUC': 0.5478953232026702}" ] }, - "execution_count": 18, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -420,7 +477,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 21, "id": "5eac234c", "metadata": {}, "outputs": [], @@ -431,7 +488,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 22, "id": "1482b8a9", "metadata": {}, "outputs": [ @@ -460,7 +517,7 @@ " 0.9500847267465704]}" ] }, - "execution_count": 20, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } @@ -471,7 +528,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 23, "id": "bdcfe547", "metadata": {}, "outputs": [ @@ -484,7 +541,7 @@ " 'AUC': 0.9409755698264062}" ] }, - "execution_count": 21, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } @@ -525,7 +582,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 24, "id": "eb184f09", "metadata": {}, "outputs": [], @@ -537,7 +594,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 25, "id": "9f102d39", "metadata": {}, "outputs": [ @@ -547,34 +604,34 @@ "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", - "100 91.3M 100 91.3M 0 0 2831k 0 0:00:33 0:00:33 --:--:-- 1812k 0:00:10 0:00:10 6153k\n" + "100 91.3M 100 91.3M 0 0 1132k 0 0:01:22 0:01:22 --:--:-- 1373k-:--:-- 0:00:01 --:--:-- 0 0 1025k 0 0:01:31 0:00:38 0:00:53 1215k\n" ] } ], "source": [ - "!curl \"https://raw.githubusercontent.com/Mcompetitions/M4-methods/master/Dataset/Train/Daily-train.csv\" -o m4.csv" + "!curl \"https://raw.githubusercontent.com/Mcompetitions/M4-methods/master/Dataset/Train/Daily-train.csv\" -o data/m4.csv" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 26, "id": "fa077ffa", "metadata": {}, "outputs": [], "source": [ - "df_raw = pd.read_csv(\"m4.csv\")" + "df_raw = pd.read_csv(\"data/m4.csv\")" ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 27, "id": "b37dec40", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "102f7ab2b4214358a16921c61912412e", + "model_id": "8e52dcb4ef5e4955baf35f677edd7b5d", "version_major": 2, "version_minor": 0 }, @@ -601,7 +658,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 28, "id": "e2665588", "metadata": {}, "outputs": [], @@ -621,7 +678,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 29, "id": "fe0ed6c9", "metadata": {}, "outputs": [ @@ -658,7 +715,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 30, "id": "89be6d07", "metadata": {}, "outputs": [], @@ -670,7 +727,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 31, "id": "75132e0d", "metadata": {}, "outputs": [], @@ -683,12 +740,12 @@ "id": "8b293b2b", "metadata": {}, "source": [ - "It takes about 2 minutes even for naive model to evaluate the performance on this dataset, imagine what time it takes for more complex one." + "It takes about 2 minutes even for naive model to evaluate the performance on this dataset, imagine how long it takes for more complex one." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 32, "id": "4d37dc70", "metadata": {}, "outputs": [ @@ -698,25 +755,25 @@ "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.5s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 3.1s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.6s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.6s finished\n", + "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 3.0s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.9s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.9s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", - "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:277: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", + "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:279: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", " future_dates = pd.date_range(\n", - "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 3.5s remaining: 0.0s\n", - "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:277: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", + "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.0s remaining: 0.0s\n", + "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:279: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", " future_dates = pd.date_range(\n", - "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 6.9s remaining: 0.0s\n", - "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:277: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", + "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 8.3s remaining: 0.0s\n", + "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:279: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", " future_dates = pd.date_range(\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 10.2s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 10.2s finished\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 11.4s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 11.4s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", - "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 10.7s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 21.7s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 32.3s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 32.4s finished\n" + "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 9.7s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 20.5s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 31.5s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 31.6s finished\n" ] } ], @@ -738,7 +795,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 33, "id": "9be354ae", "metadata": {}, "outputs": [], @@ -748,7 +805,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 34, "id": "0cf82f66", "metadata": {}, "outputs": [ @@ -769,7 +826,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 35, "id": "19c3dfd9", "metadata": {}, "outputs": [ @@ -798,7 +855,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 36, "id": "5c3709f5", "metadata": {}, "outputs": [ @@ -826,7 +883,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 37, "id": "b6f66180", "metadata": {}, "outputs": [], @@ -844,7 +901,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 38, "id": "8cad8d7e", "metadata": {}, "outputs": [ @@ -854,7 +911,7 @@ "['weasel', 'tsfresh', 'tsfresh_min']" ] }, - "execution_count": 36, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } @@ -873,7 +930,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 39, "id": "e7cbca5d", "metadata": {}, "outputs": [], @@ -891,7 +948,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 40, "id": "da34a6e5", "metadata": {}, "outputs": [], @@ -910,7 +967,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 41, "id": "4b1f3b5a", "metadata": {}, "outputs": [], @@ -934,7 +991,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 42, "id": "7981bd34", "metadata": {}, "outputs": [ @@ -942,8 +999,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 11.4 s, sys: 1.56 s, total: 13 s\n", - "Wall time: 13.1 s\n" + "CPU times: user 11.5 s, sys: 1.08 s, total: 12.6 s\n", + "Wall time: 12.7 s\n" ] } ], @@ -954,7 +1011,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 43, "id": "1b1b783c", "metadata": {}, "outputs": [ @@ -980,7 +1037,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 44, "id": "3dded441", "metadata": {}, "outputs": [ @@ -988,8 +1045,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 13.5 s, sys: 1.95 s, total: 15.5 s\n", - "Wall time: 15.7 s\n" + "CPU times: user 11.2 s, sys: 1 s, total: 12.2 s\n", + "Wall time: 12.1 s\n" ] } ], @@ -1001,18 +1058,18 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 45, "id": "42466b71", "metadata": {}, "outputs": [], "source": [ - "threthold = 0.4\n", - "predictability = {segment: int(predictability_score[i] > threthold) for i, segment in enumerate(sorted(ts.segments))}" + "threshold = 0.4\n", + "predictability = {segment: int(predictability_score[i] > threshold) for i, segment in enumerate(sorted(ts.segments))}" ] }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 46, "id": "6586de3e", "metadata": {}, "outputs": [ @@ -1038,7 +1095,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 47, "id": "b0cd3965", "metadata": {}, "outputs": [ @@ -1147,7 +1204,7 @@ "2778 D35 14.327464 1.0" ] }, - "execution_count": 45, + "execution_count": 47, "metadata": {}, "output_type": "execute_result" } @@ -1158,7 +1215,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 48, "id": "40942d28", "metadata": {}, "outputs": [ @@ -1195,7 +1252,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 49, "id": "652392f3", "metadata": {}, "outputs": [ @@ -1304,7 +1361,7 @@ "1348 D2211 0.000000 1.0" ] }, - "execution_count": 47, + "execution_count": 49, "metadata": {}, "output_type": "execute_result" } @@ -1315,7 +1372,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 50, "id": "01619c6d", "metadata": {}, "outputs": [