From 151da62ee56292120d1e9f07eb9ed289fd48b2b2 Mon Sep 17 00:00:00 2001 From: "d.a.bunin" Date: Tue, 27 Jun 2023 16:11:27 +0300 Subject: [PATCH 1/6] fix: fix classification notebook to download dataset without error --- examples/classification.ipynb | 276 +++++++++++++++++++++------------- 1 file changed, 168 insertions(+), 108 deletions(-) diff --git a/examples/classification.ipynb b/examples/classification.ipynb index 9419a8d2e..4eeb5b838 100644 --- a/examples/classification.ipynb +++ b/examples/classification.ipynb @@ -5,7 +5,11 @@ "id": "cb9e5d62", "metadata": {}, "source": [ - "# Classification notebook" + "# Classification notebook\n", + "\n", + "\n", + " \n", + "" ] }, { @@ -56,36 +60,81 @@ "source": [ "### Load Dataset \n", "\n", - "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FirdA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n", + "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FordA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n", "\n", - "To load the dataset, we will use `fetch_ucr_dataset` util form [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html). " + "To load the dataset, we can use `fetch_ucr_dataset` util form [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but it currently doesn't work on version 0.12.0 that we use. So, we will load the dataset manually." ] }, { "cell_type": "code", "execution_count": 2, + "id": "39bd234e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " % Total % Received % Xferd Average Speed Time Time Time Current\n", + " Dload Upload Total Spent Left Speed\n", + "100 34.6M 100 34.6M 0 0 521k 0 0:01:08 0:01:08 --:--:-- 562k 0 0 346k 0 0:01:42 0:00:09 0:01:33 388k 0 0:01:13 0:00:24 0:00:49 658k 0 0:01:08 0:00:58 0:00:10 481k\n", + "Archive: data/ford_a.zip\n", + "caution: filename not matched: -q\n" + ] + } + ], + "source": [ + "!curl \"http://www.timeseriesclassification.com/ClassificationDownloads/FordA.zip\" -o data/ford_a.zip\n", + "!unzip data/ford_a.zip -d data/ford_a -q" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "972bc4d0", + "metadata": {}, + "outputs": [], + "source": [ + "import pathlib\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 4, "id": "d5c515aa", "metadata": {}, "outputs": [], "source": [ - "from pyts.datasets.ucr import fetch_ucr_dataset\n", - "import matplotlib.pyplot as plt" + "def load_ford_a(path: pathlib.Path, dataset_name: str):\n", + " train_path = path / (dataset_name + \"_TRAIN.txt\")\n", + " test_path = path / (dataset_name + \"_TEST.txt\")\n", + " data_train = np.genfromtxt(train_path)\n", + " data_test = np.genfromtxt(test_path)\n", + "\n", + " X_train, y_train = data_train[:, 1:], data_train[:, 0]\n", + " X_test, y_test = data_test[:, 1:], data_test[:, 0]\n", + " \n", + " y_train = y_train.astype(\"int64\")\n", + " y_test = y_test.astype(\"int64\")\n", + " \n", + " return X_train, X_test, y_train, y_test" ] }, { "cell_type": "code", - "execution_count": 3, - "id": "da9ae9ad", + "execution_count": 5, + "id": "4d97eb8e", "metadata": {}, "outputs": [], "source": [ - "X_train, X_test, y_train, y_test = fetch_ucr_dataset(dataset=\"FordA\", return_X_y=True)\n", + "X_train, X_test, y_train, y_test = load_ford_a(pathlib.Path(\"data\") / \"ford_a\", \"FordA\")\n", "y_train[y_train == -1], y_test[y_test == -1] = 0, 0 # transform labels to 0,1" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 6, "id": "c6f62d48", "metadata": {}, "outputs": [ @@ -95,7 +144,7 @@ "((3601, 500), (1320, 500), (3601,), (1320,))" ] }, - "execution_count": 4, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -114,7 +163,17 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 7, + "id": "e356e8e7", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 8, "id": "60e2be7c", "metadata": {}, "outputs": [ @@ -143,15 +202,15 @@ "source": [ "### Feature extraction \n", "\n", - "Raw time series values usually are not the best features for the classifier. Series length is usually much higher than the number of samples in the dataset, this case classifiers to work poorly. There exists special technique to extract more informative features from the time series, you can find a comprehensive review of them in this [paper](https://hal.inria.fr/hal-03558165/document).\n", + "Raw time series values are usually not the best features for the classifier. The length of the series is usually much greater than the number of samples in the dataset, in which case classifiers will perform poorly. There are special techniques to extract more informative features from the time series, you can find a comprehensive review of them in this [paper](https://hal.inria.fr/hal-03558165/document).\n", "\n", - "In our library we offer two methods for feature extraction able to work with the time series of different length\n", - "1. `TSFreshFeatureExtractor` - extract features using `extract_features` method form [tsfresh](https://tsfresh.readthedocs.io/en/latest/)\n" + "In our library we offer two methods for feature extraction methods that can work with the time series of different lengths:\n", + "1. `TSFreshFeatureExtractor` — extract features using `extract_features` method form [tsfresh](https://tsfresh.readthedocs.io/en/latest/)." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 9, "id": "a582691c", "metadata": {}, "outputs": [], @@ -164,12 +223,12 @@ "id": "48934419", "metadata": {}, "source": [ - "Constructor expects parameters of `extract_features` method, see the full list [here](https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html?highlight=feature_extraction#tsfresh.feature_extraction.extraction.extract_features) and `fill_na_value` parameter defines the value to fill the possible NaNs in the generated features" + "Constructor expects parameters of `extract_features` method, see the full list [here](https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html?highlight=feature_extraction#tsfresh.feature_extraction.extraction.extract_features). It also has parameter `fill_na_value` that defines the value for filling the possible NaNs in the generated features." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 10, "id": "854a393a", "metadata": {}, "outputs": [], @@ -181,7 +240,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 11, "id": "a26404cb", "metadata": {}, "outputs": [], @@ -194,20 +253,19 @@ "id": "1341d8d4", "metadata": {}, "source": [ - "2. `WEASELFeatureExtractor` -- extract features using the WEASEL algorithm, see the original [paper](https://arxiv.org/pdf/1701.07681.pdf)\n", + "2. `WEASELFeatureExtractor` — extract features using the WEASEL algorithm, see the original [paper](https://arxiv.org/pdf/1701.07681.pdf).\n", "\n", "This method has a long list of parameters, the most important of them are: \n", - "- **padding_value** -- value to pad the series on test set to fit the shortest series in train set\n", - "- **word_size**, **n_bins** -- word size and the alphabet size to approximate the series(strongly influence on the performance)\n", - "- **window_sizes** -- sizes of the sliding windows\n", - "- **window_steps** -- steps of the windows\n", - "- **chi2_threshold** -- feature selection threshold(the greter, the fewer features are selected)\n", - " " + "- **padding_value** — value to pad the series on test set to fit the shortest series in train set\n", + "- **word_size**, **n_bins** — word size and the alphabet size to approximate the series (strongly influence on the performance)\n", + "- **window_sizes** — sizes of the sliding windows\n", + "- **window_steps** — steps of the windows\n", + "- **chi2_threshold** — feature selection threshold (the greter, the fewer features are selected)" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 12, "id": "39de5856", "metadata": {}, "outputs": [], @@ -217,7 +275,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 13, "id": "efac0a3f", "metadata": {}, "outputs": [], @@ -244,7 +302,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 14, "id": "9d9cb6a8", "metadata": {}, "outputs": [], @@ -263,7 +321,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 15, "id": "473ce6ae", "metadata": {}, "outputs": [], @@ -282,18 +340,17 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 16, "id": "3e58c3ee", "metadata": {}, "outputs": [], "source": [ - "from sklearn.model_selection import KFold\n", - "import numpy as np" + "from sklearn.model_selection import KFold" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 17, "id": "bea29ea8", "metadata": {}, "outputs": [], @@ -313,7 +370,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 18, "id": "825794c8", "metadata": {}, "outputs": [ @@ -321,16 +378,16 @@ "name": "stderr", "output_type": "stream", "text": [ - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [00:00<00:00, 2913.77it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 721/721 [00:00<00:00, 3044.54it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3282.90it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3163.29it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3156.11it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3177.47it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3074.36it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2563.62it/s]\n", - "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 2949.87it/s]\n", - "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2967.27it/s]\n" + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2880/2880 [00:00<00:00, 3198.77it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 721/721 [00:00<00:00, 3384.68it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3294.64it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3469.26it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3324.07it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 3412.13it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 2980.69it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2783.82it/s]\n", + "Feature Extraction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2881/2881 [00:00<00:00, 3105.03it/s]\n", + "Feature Extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 720/720 [00:00<00:00, 2932.66it/s]\n" ] } ], @@ -348,7 +405,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 19, "id": "1ea88a41", "metadata": {}, "outputs": [ @@ -377,7 +434,7 @@ " 0.5629105765287568]}" ] }, - "execution_count": 17, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -388,7 +445,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 20, "id": "211d5c5d", "metadata": {}, "outputs": [ @@ -401,7 +458,7 @@ " 'AUC': 0.5478953232026702}" ] }, - "execution_count": 18, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -420,7 +477,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 21, "id": "5eac234c", "metadata": {}, "outputs": [], @@ -431,7 +488,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 22, "id": "1482b8a9", "metadata": {}, "outputs": [ @@ -460,7 +517,7 @@ " 0.9500847267465704]}" ] }, - "execution_count": 20, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } @@ -471,7 +528,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 23, "id": "bdcfe547", "metadata": {}, "outputs": [ @@ -484,7 +541,7 @@ " 'AUC': 0.9409755698264062}" ] }, - "execution_count": 21, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } @@ -525,7 +582,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 24, "id": "eb184f09", "metadata": {}, "outputs": [], @@ -537,7 +594,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 25, "id": "9f102d39", "metadata": {}, "outputs": [ @@ -547,34 +604,34 @@ "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", - "100 91.3M 100 91.3M 0 0 2831k 0 0:00:33 0:00:33 --:--:-- 1812k 0:00:10 0:00:10 6153k\n" + "100 91.3M 100 91.3M 0 0 1132k 0 0:01:22 0:01:22 --:--:-- 1373k-:--:-- 0:00:01 --:--:-- 0 0 1025k 0 0:01:31 0:00:38 0:00:53 1215k\n" ] } ], "source": [ - "!curl \"https://raw.githubusercontent.com/Mcompetitions/M4-methods/master/Dataset/Train/Daily-train.csv\" -o m4.csv" + "!curl \"https://raw.githubusercontent.com/Mcompetitions/M4-methods/master/Dataset/Train/Daily-train.csv\" -o data/m4.csv" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 26, "id": "fa077ffa", "metadata": {}, "outputs": [], "source": [ - "df_raw = pd.read_csv(\"m4.csv\")" + "df_raw = pd.read_csv(\"data/m4.csv\")" ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 27, "id": "b37dec40", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "102f7ab2b4214358a16921c61912412e", + "model_id": "8e52dcb4ef5e4955baf35f677edd7b5d", "version_major": 2, "version_minor": 0 }, @@ -601,7 +658,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 28, "id": "e2665588", "metadata": {}, "outputs": [], @@ -621,7 +678,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 29, "id": "fe0ed6c9", "metadata": {}, "outputs": [ @@ -658,7 +715,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 30, "id": "89be6d07", "metadata": {}, "outputs": [], @@ -670,7 +727,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 31, "id": "75132e0d", "metadata": {}, "outputs": [], @@ -683,12 +740,12 @@ "id": "8b293b2b", "metadata": {}, "source": [ - "It takes about 2 minutes even for naive model to evaluate the performance on this dataset, imagine what time it takes for more complex one." + "It takes about 2 minutes even for naive model to evaluate the performance on this dataset, imagine how long it takes for more complex one." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 32, "id": "4d37dc70", "metadata": {}, "outputs": [ @@ -698,25 +755,25 @@ "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.5s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 3.1s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.6s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.6s finished\n", + "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 3.0s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.9s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.9s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", - "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:277: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", + "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:279: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", " future_dates = pd.date_range(\n", - "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 3.5s remaining: 0.0s\n", - "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:277: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", + "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 5.0s remaining: 0.0s\n", + "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:279: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", " future_dates = pd.date_range(\n", - "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 6.9s remaining: 0.0s\n", - "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:277: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", + "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 8.3s remaining: 0.0s\n", + "/Users/d.a.binin/Documents/tasks/etna-github/etna/datasets/tsdataset.py:279: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.\n", " future_dates = pd.date_range(\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 10.2s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 10.2s finished\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 11.4s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 11.4s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", - "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 10.7s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 21.7s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 32.3s remaining: 0.0s\n", - "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 32.4s finished\n" + "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 9.7s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 20.5s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 31.5s remaining: 0.0s\n", + "[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 31.6s finished\n" ] } ], @@ -738,7 +795,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 33, "id": "9be354ae", "metadata": {}, "outputs": [], @@ -748,7 +805,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 34, "id": "0cf82f66", "metadata": {}, "outputs": [ @@ -769,7 +826,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 35, "id": "19c3dfd9", "metadata": {}, "outputs": [ @@ -798,7 +855,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 36, "id": "5c3709f5", "metadata": {}, "outputs": [ @@ -826,7 +883,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 37, "id": "b6f66180", "metadata": {}, "outputs": [], @@ -844,7 +901,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 38, "id": "8cad8d7e", "metadata": {}, "outputs": [ @@ -854,7 +911,7 @@ "['weasel', 'tsfresh', 'tsfresh_min']" ] }, - "execution_count": 36, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } @@ -873,7 +930,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 39, "id": "e7cbca5d", "metadata": {}, "outputs": [], @@ -891,7 +948,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 40, "id": "da34a6e5", "metadata": {}, "outputs": [], @@ -910,7 +967,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 41, "id": "4b1f3b5a", "metadata": {}, "outputs": [], @@ -934,7 +991,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 42, "id": "7981bd34", "metadata": {}, "outputs": [ @@ -942,8 +999,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 11.4 s, sys: 1.56 s, total: 13 s\n", - "Wall time: 13.1 s\n" + "CPU times: user 11.5 s, sys: 1.08 s, total: 12.6 s\n", + "Wall time: 12.7 s\n" ] } ], @@ -954,7 +1011,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 43, "id": "1b1b783c", "metadata": {}, "outputs": [ @@ -980,7 +1037,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 44, "id": "3dded441", "metadata": {}, "outputs": [ @@ -988,8 +1045,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 13.5 s, sys: 1.95 s, total: 15.5 s\n", - "Wall time: 15.7 s\n" + "CPU times: user 11.2 s, sys: 1 s, total: 12.2 s\n", + "Wall time: 12.1 s\n" ] } ], @@ -1001,18 +1058,21 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 45, "id": "42466b71", "metadata": {}, "outputs": [], "source": [ - "threthold = 0.4\n", - "predictability = {segment: int(predictability_score[i] > threthold) for i, segment in enumerate(sorted(ts.segments))}" + "threshold = 0.4\n", + "predictability = {\n", + " segment: int(predictability_score[i] > threshold) \n", + " for i, segment in enumerate(sorted(ts.segments))\n", + "}" ] }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 46, "id": "6586de3e", "metadata": {}, "outputs": [ @@ -1038,7 +1098,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 47, "id": "b0cd3965", "metadata": {}, "outputs": [ @@ -1147,7 +1207,7 @@ "2778 D35 14.327464 1.0" ] }, - "execution_count": 45, + "execution_count": 47, "metadata": {}, "output_type": "execute_result" } @@ -1158,7 +1218,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 48, "id": "40942d28", "metadata": {}, "outputs": [ @@ -1195,7 +1255,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 49, "id": "652392f3", "metadata": {}, "outputs": [ @@ -1304,7 +1364,7 @@ "1348 D2211 0.000000 1.0" ] }, - "execution_count": 47, + "execution_count": 49, "metadata": {}, "output_type": "execute_result" } @@ -1315,7 +1375,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 50, "id": "01619c6d", "metadata": {}, "outputs": [ From 62164fbd6f229d353c6068b33fc4c586613bec20 Mon Sep 17 00:00:00 2001 From: "d.a.bunin" Date: Tue, 27 Jun 2023 16:12:11 +0300 Subject: [PATCH 2/6] chore: update changelog --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b9c39fa2b..f6fa72bbb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -61,6 +61,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Fix problem with segment name "target" in `StackingEnsemble` ([#1262](https://github.com/tinkoff-ai/etna/pull/1262)) - Fix `BasePipeline.forecast` when prediction intervals are estimated on history data with presence of NaNs ([#1291](https://github.com/tinkoff-ai/etna/pull/1291)) - Teach `BaseMixin.set_params` to work with nested `list` and `tuple` ([#1201](https://github.com/tinkoff-ai/etna/pull/1201)) +- Fix `classification` notebook to download `FordA` dataset without error ([]()) ## [2.0.0] - 2023-04-11 ### Added From 1a3a8b25c3ceb2e567f7938d8e1273b9718caf04 Mon Sep 17 00:00:00 2001 From: "d.a.bunin" Date: Tue, 27 Jun 2023 16:14:02 +0300 Subject: [PATCH 3/6] chore: update changelog --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f6fa72bbb..3ed297dd2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -61,7 +61,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Fix problem with segment name "target" in `StackingEnsemble` ([#1262](https://github.com/tinkoff-ai/etna/pull/1262)) - Fix `BasePipeline.forecast` when prediction intervals are estimated on history data with presence of NaNs ([#1291](https://github.com/tinkoff-ai/etna/pull/1291)) - Teach `BaseMixin.set_params` to work with nested `list` and `tuple` ([#1201](https://github.com/tinkoff-ai/etna/pull/1201)) -- Fix `classification` notebook to download `FordA` dataset without error ([]()) +- Fix `classification` notebook to download `FordA` dataset without error ([#1298](https://github.com/tinkoff-ai/etna/pull/1298)) ## [2.0.0] - 2023-04-11 ### Added From 04ad3c1526a138521d0db3d774e9c3158ee258ce Mon Sep 17 00:00:00 2001 From: "d.a.bunin" Date: Tue, 27 Jun 2023 16:16:26 +0300 Subject: [PATCH 4/6] style: reformat code --- examples/classification.ipynb | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/examples/classification.ipynb b/examples/classification.ipynb index 4eeb5b838..9d97acb58 100644 --- a/examples/classification.ipynb +++ b/examples/classification.ipynb @@ -114,10 +114,10 @@ "\n", " X_train, y_train = data_train[:, 1:], data_train[:, 0]\n", " X_test, y_test = data_test[:, 1:], data_test[:, 0]\n", - " \n", + "\n", " y_train = y_train.astype(\"int64\")\n", " y_test = y_test.astype(\"int64\")\n", - " \n", + "\n", " return X_train, X_test, y_train, y_test" ] }, @@ -1064,10 +1064,7 @@ "outputs": [], "source": [ "threshold = 0.4\n", - "predictability = {\n", - " segment: int(predictability_score[i] > threshold) \n", - " for i, segment in enumerate(sorted(ts.segments))\n", - "}" + "predictability = {segment: int(predictability_score[i] > threshold) for i, segment in enumerate(sorted(ts.segments))}" ] }, { From f29224e30692c80263eb98d6c43faac25eda4b39 Mon Sep 17 00:00:00 2001 From: "d.a.bunin" Date: Wed, 28 Jun 2023 14:38:29 +0300 Subject: [PATCH 5/6] fix: fix comments on PR --- examples/classification.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/classification.ipynb b/examples/classification.ipynb index 9d97acb58..d3a417644 100644 --- a/examples/classification.ipynb +++ b/examples/classification.ipynb @@ -62,7 +62,7 @@ "\n", "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FordA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n", "\n", - "To load the dataset, we can use `fetch_ucr_dataset` util form [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but it currently doesn't work on version 0.12.0 that we use. So, we will load the dataset manually." + "It was possible to load the dataset using `fetch_ucr_dataset` function from [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but it currently doesn't work on version 0.12.0 that we use. So, we will load the dataset manually." ] }, { @@ -204,7 +204,7 @@ "\n", "Raw time series values are usually not the best features for the classifier. The length of the series is usually much greater than the number of samples in the dataset, in which case classifiers will perform poorly. There are special techniques to extract more informative features from the time series, you can find a comprehensive review of them in this [paper](https://hal.inria.fr/hal-03558165/document).\n", "\n", - "In our library we offer two methods for feature extraction methods that can work with the time series of different lengths:\n", + "In our library we offer two methods for feature extraction that can work with the time series of different lengths:\n", "1. `TSFreshFeatureExtractor` — extract features using `extract_features` method form [tsfresh](https://tsfresh.readthedocs.io/en/latest/)." ] }, From 02c2b3d24bc6bb532f89657571104c0d0293660c Mon Sep 17 00:00:00 2001 From: "d.a.bunin" Date: Wed, 28 Jun 2023 14:54:53 +0300 Subject: [PATCH 6/6] fix: remove notion about version of pyts --- examples/classification.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/classification.ipynb b/examples/classification.ipynb index d3a417644..6c97bd34f 100644 --- a/examples/classification.ipynb +++ b/examples/classification.ipynb @@ -62,7 +62,7 @@ "\n", "Consider the example `FordA` dataset from [UCR archive](https://www.cs.ucr.edu/~eamonn/time_series_data/). Dataset consists of engine noise measurements and the problem is to diagnose whether a certain symptom exists in the engine. The comprehensive description of `FordA` dataset can be found [here](http://www.timeseriesclassification.com/description.php?Dataset=FordA). \n", "\n", - "It was possible to load the dataset using `fetch_ucr_dataset` function from [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but it currently doesn't work on version 0.12.0 that we use. So, we will load the dataset manually." + "It is possible to load the dataset using `fetch_ucr_dataset` function from [`pyts` library](https://pyts.readthedocs.io/en/stable/index.html), but let's do it manually." ] }, {