-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(pipelines): updated notebook introducing the concepts
- Loading branch information
Showing
6 changed files
with
606 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,217 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "b773bf8e-c420-44e1-80a6-99f75dd12268", | ||
"metadata": {}, | ||
"source": [ | ||
"## Pipelines and Transformers\n", | ||
"\n", | ||
"This notebook showcases the current version of data processing pipelines in CapyMOA. \n", | ||
"\n", | ||
"* Includes an example of how preprocessing can be accomplished via pipelines and transformers.\n", | ||
"* Transformers transform an instance, e.g., using standardization, normalization, etc.\n", | ||
"* Pipelines bundle transformers and can also act as classifiers or regressors\n", | ||
"\n", | ||
"*Please note that this feature is still under development; some functionality might not yet be available or change in future releases.*\n", | ||
"\n", | ||
"**notebook last updated on 24/05/2024**" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "55d070de-8697-4f98-a11b-eab4e3d5c281", | ||
"metadata": {}, | ||
"source": [ | ||
"### 1. Running onlineBagging without any preprocessing\n", | ||
"\n", | ||
"First, let us have a look at a simple test-then-train classification example without pipelines. \n", | ||
"- We loop over the instances of the data stream\n", | ||
"- make a prediction,\n", | ||
"- update the evaluator with the prediction and label\n", | ||
"- and then train the classifier on the instance." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "14681f54-23a1-4f93-9145-abf484c91c54", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"79.05190677966102" | ||
] | ||
}, | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"## Test-then-train loop\n", | ||
"from capymoa.stream import stream_from_file\n", | ||
"from capymoa.classifier import OnlineBagging\n", | ||
"from capymoa.evaluation import ClassificationEvaluator\n", | ||
"\n", | ||
"## Opening a file as a stream\n", | ||
"DATA_PATH = \"../data/\"\n", | ||
"elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+\"electricity.csv\")\n", | ||
"\n", | ||
"# Creating a learner\n", | ||
"ob_learner = OnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5)\n", | ||
"\n", | ||
"# Creating the evaluator\n", | ||
"ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())\n", | ||
"\n", | ||
"while elec_stream.has_more_instances():\n", | ||
" instance = elec_stream.next_instance()\n", | ||
" \n", | ||
" prediction = ob_learner.predict(instance)\n", | ||
" ob_evaluator.update(instance.y_index, prediction)\n", | ||
" ob_learner.train(instance)\n", | ||
"\n", | ||
"ob_evaluator.accuracy()" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "0c1360ef-0583-4c87-8645-1e2d701fffca", | ||
"metadata": {}, | ||
"source": [ | ||
"### 2. Online Bagging using pipelines and transformers\n", | ||
"\n", | ||
"If we want to perform some preprocessing, such as normalization or feature transformation, or a combination of both, we can chain multiple such `Transformer`s within a pipeline. The last step of a pipeline is a learner, such as capymoa classifier or regressor.\n", | ||
"\n", | ||
"Similar as classifiers and regressors, pipelines support `train` and `test`. Hence, we can use them in the same way as we would use other capymoa learners. Internally, the pipeline object passes an incoming instance from one transformer to the next. It then returns the prediction of the classifier / regressor using the transformed instance.\n", | ||
"\n", | ||
"Creating a pipeline consists of the following steps:\n", | ||
"1. Create a stream instance\n", | ||
"2. Initialize the transformers\n", | ||
"3. Initialize the learner\n", | ||
"4. Create the pipeline. Here, we use a `ClassifierPipeline`\n", | ||
"5. Use the pipeline the same way as any other learner." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "ae9bb646-e0d1-4de6-b5a1-cff0f0a1b172", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"ename": "ImportError", | ||
"evalue": "Failed to import 'moa.streams.FilteredQueueStream'", | ||
"output_type": "error", | ||
"traceback": [ | ||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | ||
"\u001b[1;31mException\u001b[0m Traceback (most recent call last)", | ||
"File \u001b[1;32morg.jpype.JPypeContext.java:-1\u001b[0m, in \u001b[0;36morg.jpype.JPypeContext.callMethod\u001b[1;34m()\u001b[0m\n", | ||
"\u001b[1;31mException\u001b[0m: Java Exception", | ||
"\nThe above exception was the direct cause of the following exception:\n", | ||
"\u001b[1;31mjava.lang.ClassNotFoundException\u001b[0m Traceback (most recent call last)", | ||
"File \u001b[1;32m~\\.virtualenvs\\CapyMOA-pLr6U80W\\Lib\\site-packages\\jpype\\imports.py:195\u001b[0m, in \u001b[0;36m_JImportLoader.find_spec\u001b[1;34m(self, name, path, target)\u001b[0m\n\u001b[0;32m 193\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m 194\u001b[0m \u001b[38;5;66;03m# Use forname because it give better diagnostics\u001b[39;00m\n\u001b[1;32m--> 195\u001b[0m \u001b[38;5;28mcls\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[43m_jpype\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_java_lang_Class\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mforName\u001b[49m\u001b[43m(\u001b[49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m_jpype\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mJPypeClassLoader\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 197\u001b[0m \u001b[38;5;66;03m# This code only is hit if an error was not thrown\u001b[39;00m\n", | ||
"\u001b[1;31mjava.lang.ClassNotFoundException\u001b[0m: java.lang.ClassNotFoundException: moa.streams.FilteredQueueStream", | ||
"\nThe above exception was the direct cause of the following exception:\n", | ||
"\u001b[1;31mImportError\u001b[0m Traceback (most recent call last)", | ||
"Cell \u001b[1;32mIn[2], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstream\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpreprocessing\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m MOATransformer\n\u001b[0;32m 2\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstream\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpreprocessing\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m ClassifierPipeline\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstream\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Stream\n", | ||
"File \u001b[1;32m~\\Documents\\code\\CapyMOA\\src\\capymoa\\stream\\preprocessing\\__init__.py:1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpipeline\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[0;32m 2\u001b[0m BasePipeline, ClassifierPipeline, RegressorPipeline\n\u001b[0;32m 3\u001b[0m )\n\u001b[0;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtransformer\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[0;32m 5\u001b[0m Transformer, MOATransformer\n\u001b[0;32m 6\u001b[0m )\n\u001b[0;32m 8\u001b[0m __all__ \u001b[38;5;241m=\u001b[39m [\n\u001b[0;32m 9\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mBasePipeline\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m 10\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mClassifierPipeline\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 13\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMOATransformer\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 14\u001b[0m ]\n", | ||
"File \u001b[1;32m~\\Documents\\code\\CapyMOA\\src\\capymoa\\stream\\preprocessing\\pipeline.py:7\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbase\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Classifier, Regressor\n\u001b[0;32m 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01minstance\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m LabeledInstance, Instance, RegressionInstance\n\u001b[1;32m----> 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstream\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpreprocessing\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtransformer\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Transformer\n\u001b[0;32m 8\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mtype_alias\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m LabelProbabilities, LabelIndex, TargetValue\n\u001b[0;32m 11\u001b[0m \u001b[38;5;28;01mclass\u001b[39;00m \u001b[38;5;21;01mBasePipeline\u001b[39;00m:\n", | ||
"File \u001b[1;32m~\\Documents\\code\\CapyMOA\\src\\capymoa\\stream\\preprocessing\\transformer.py:7\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstream\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Schema, Stream\n\u001b[0;32m 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mcapymoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01minstance\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Instance\n\u001b[1;32m----> 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mmoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstreams\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m FilteredQueueStream\n\u001b[0;32m 8\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mmoa\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mstreams\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mfilters\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m StreamFilter\n\u001b[0;32m 11\u001b[0m \u001b[38;5;28;01mclass\u001b[39;00m \u001b[38;5;21;01mTransformer\u001b[39;00m(ABC):\n", | ||
"File \u001b[1;32m~\\.virtualenvs\\CapyMOA-pLr6U80W\\Lib\\site-packages\\jpype\\imports.py:203\u001b[0m, in \u001b[0;36m_JImportLoader.find_spec\u001b[1;34m(self, name, path, target)\u001b[0m\n\u001b[0;32m 201\u001b[0m \u001b[38;5;66;03m# Not found is acceptable\u001b[39;00m\n\u001b[0;32m 202\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m ex:\n\u001b[1;32m--> 203\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mFailed to import \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m%\u001b[39m name) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mex\u001b[39;00m\n\u001b[0;32m 205\u001b[0m \u001b[38;5;66;03m# Import the java module\u001b[39;00m\n\u001b[0;32m 206\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m _ModuleSpec(name, \u001b[38;5;28mself\u001b[39m)\n", | ||
"\u001b[1;31mImportError\u001b[0m: Failed to import 'moa.streams.FilteredQueueStream'" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from capymoa.stream.preprocessing import MOATransformer\n", | ||
"from capymoa.stream.preprocessing import ClassifierPipeline\n", | ||
"from capymoa.stream import Stream\n", | ||
"from moa.streams.filters import AddNoiseFilter, NormalisationFilter\n", | ||
"from moa.streams import FilteredStream\n", | ||
"\n", | ||
"# Open the stream from an ARFF file\n", | ||
"elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+\"electricity.arff\")\n", | ||
"\n", | ||
"# Creating the transformer\n", | ||
"normalisation_transformer = MOATransformer(schema=elec_stream.get_schema(), moa_filter=NormalisationFilter())\n", | ||
"add_noise_transformer = MOATransformer(schema=normalisation_transformer.get_schema(), moa_filter=AddNoiseFilter())\n", | ||
"\n", | ||
"# Creating a learner\n", | ||
"ob_learner = OnlineBagging(schema=add_noise_transformer.get_schema(), ensemble_size=5)\n", | ||
"\n", | ||
"# Creating and populating the pipeline\n", | ||
"pipeline = ClassifierPipeline(transformers=[normalisation_transformer],\n", | ||
" learner=ob_learner)\n", | ||
"\n", | ||
"# Alternative:\n", | ||
"# pipeline = ClassifierPipeline()\n", | ||
"# pipeline.add_transformer(normalization_transformer)\n", | ||
"# pipeline.add_transformer(add_noise_transformer)\n", | ||
"# pipeline.set_learner(ob_learner)\n", | ||
"\n", | ||
"# Creating the evaluator\n", | ||
"ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema()) #TODO: Change to transformer.get_schema() to pipeline.get_schema() or something like that.\n", | ||
"\n", | ||
"while elec_stream.has_more_instances():\n", | ||
" instance = elec_stream.next_instance()\n", | ||
" prediction = pipeline.predict(instance)\n", | ||
" ob_evaluator.update(instance.y_index, prediction)\n", | ||
" pipeline.train(instance)\n", | ||
"\n", | ||
"ob_evaluator.accuracy()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5f747c04-59ab-41cb-87ca-3d66ae75731a", | ||
"metadata": {}, | ||
"source": [ | ||
"Last, we can also get a textual representation of the pipeline:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c11f5b39-bf53-496e-b42c-25f89458ff03", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"str(pipeline)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f3c8271b-bbb7-4ca9-97f5-fb41e27ec4fd", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.