Skip to content

Commit

Permalink
Addressing review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
nuwangunasekara authored and hmgomes committed Apr 29, 2024
1 parent f7cffd1 commit 958456f
Show file tree
Hide file tree
Showing 3 changed files with 294 additions and 508 deletions.
252 changes: 59 additions & 193 deletions notebooks/03_0_using_sklearn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,46 +9,21 @@
"# Using sklearn with CapyMOA\n",
"\n",
"* Demonstrate how someone can directly use sklearn learners in CapyMOA.\n",
"* Ideally, one should be free to use other learners\n",
"\n",
"**Accessing the input data x()**\n",
"\n",
"* Accessing the input data as a double array from an ```Instance``` through function ```x()```\n",
"* Instances are represented internally as MOA Instances."
"* Ideally, one should be free to use other learners"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "96cb3df1-190c-49ea-959b-292559df13e6",
"metadata": {},
"source": "## Reading data and accessing x()"
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3b7be7ed-97d2-437a-9ed9-fb71e4f33328",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-26T01:24:50.704474Z",
"start_time": "2024-04-26T01:24:45.332800Z"
"start_time": "2024-04-29T11:55:56.144446Z"
},
"jupyter": {
"is_executing": true
}
},
"source": [
"from capymoa.stream import stream_from_file\n",
"\n",
"DATA_PATH = \"../data/\"\n",
"\n",
"## Opening a file as a stream\n",
"elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+\"electricity.csv\")\n",
"\n",
"elec_stream.restart()\n",
"i = 0\n",
"while elec_stream.has_more_instances():\n",
" instance = elec_stream.next_instance()\n",
" if i < 20: # prevent printing all the instances\n",
" print(f'x: {instance.x}, y: {instance.y_index}')\n",
" i+=1"
],
"outputs": [
{
"name": "stdout",
Expand All @@ -59,96 +34,52 @@
"JVM Location (system): \n",
"JAVA_HOME: /Users/ng98/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home\n",
"JVM args: ['-Xmx8g', '-Xss10M']\n",
"Sucessfully started the JVM and added MOA jar to the class path\n",
"x: [0. 0.056443 0.439155 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.021277 0.051699 0.415055 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.042553 0.051489 0.385004 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.06383 0.045485 0.314639 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.085106 0.042482 0.251116 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.106383 0.041161 0.207528 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.12766 0.041161 0.171824 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.148936 0.041161 0.152782 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.170213 0.041161 0.13493 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.191489 0.041161 0.140583 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.212766 0.044374 0.168997 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.234043 0.049868 0.212437 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.255319 0.051489 0.298721 0.003467 0.422915 0.414912], y: 1\n",
"x: [0.276596 0.042482 0.39036 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.297872 0.040861 0.402261 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.319149 0.040711 0.462214 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.340426 0.040861 0.488248 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.361702 0.040711 0.493306 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.382979 0.041041 0.53258 0.003467 0.422915 0.414912], y: 0\n",
"x: [0.404255 0.041161 0.546415 0.003467 0.422915 0.414912], y: 0\n"
"Sucessfully started the JVM and added MOA jar to the class path\n"
]
}
],
"execution_count": 1
},
{
"cell_type": "code",
"id": "38d831be-3560-4efd-89bd-1ec71f001833",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-26T01:24:50.714965Z",
"start_time": "2024-04-26T01:24:50.707409Z"
}
},
"source": [
"# Getting some extra information about the instance through the MOA representation. \n",
"moa_instance = instance.java_instance.getData()\n",
"from capymoa.datasets import ElectricityTiny\n",
"\n",
"for i in range(0, moa_instance.numInputAttributes()):\n",
" print(moa_instance.attribute(i))\n",
" print(moa_instance.value(i))"
],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@attribute attrib_0 numeric\n",
"1.0\n",
"@attribute attrib_1 numeric\n",
"0.050679\n",
"@attribute attrib_2 numeric\n",
"0.288753\n",
"@attribute attrib_3 numeric\n",
"0.003542\n",
"@attribute attrib_4 numeric\n",
"0.355256\n",
"@attribute attrib_5 numeric\n",
"0.23114\n"
]
}
],
"execution_count": 2
"DATA_PATH = \"../data/\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "948e37ec-fbfd-43ea-a04a-7c11215452cc",
"metadata": {},
"source": [
"## Using scikit-learn\n",
"## 1. Using scikit-learn\n",
"\n",
"* Example showing how a model from scikit-learn can be used with our ```Instance``` representation"
"* Example showing how a model from scikit-learn can be used with our ```Instance``` representation with an instance loop "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2745848d-43d1-4d9e-a191-14e6be61bf7b",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-26T01:24:54.644204Z",
"start_time": "2024-04-26T01:24:50.717020Z"
"end_time": "2024-04-29T11:35:22.576729Z",
"start_time": "2024-04-29T11:35:20.707805Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"84.7"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"from sklearn import linear_model\n",
"from capymoa.evaluation import ClassificationEvaluator\n",
"from capymoa.datasets import ElectricityTiny\n",
"\n",
"# Creating a stream. Using the tiny version of the electricity dataset to speed\n",
"# up the process\n",
Expand All @@ -175,94 +106,39 @@
" partial_fit_count += 1\n",
"\n",
"ob_evaluator.accuracy()"
],
"outputs": [
{
"data": {
"text/plain": [
"84.7"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 3
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8765c774-d5f0-468b-8bb0-8ff09dcea49a",
"id": "eb71a0ef-44ff-4168-b5dd-62530f74d112",
"metadata": {},
"source": [
"### Example using a MOA learner\n",
"\n"
"## 2. Using SKClassifier\n",
"* Instead of sklearn ```SGDClassifier``` here we use CapyMOA ```SKClassifier``` on the same instance loop.\n"
]
},
{
"cell_type": "code",
"id": "8d9efdb8-fbe1-430d-b5c9-cc738d13e598",
"execution_count": 3,
"id": "1f457d2f-1b5d-4f41-a883-f11bc7a59814",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-26T01:24:55.269835Z",
"start_time": "2024-04-26T01:24:54.647439Z"
"end_time": "2024-04-29T11:35:24.504861Z",
"start_time": "2024-04-29T11:35:22.580171Z"
}
},
"source": [
"from moa.classifiers.trees import HoeffdingAdaptiveTree\n",
"from capymoa.evaluation import ClassificationEvaluator\n",
"from capymoa.base import MOAClassifier\n",
"\n",
"## Opening a file as a stream\n",
"elec_stream = ElectricityTiny()\n",
"\n",
"# Creating a learner\n",
"moa_HAT = MOAClassifier(schema=elec_stream.get_schema(), moa_learner=HoeffdingAdaptiveTree())\n",
"\n",
"# Creating the evaluator\n",
"hat_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())\n",
"\n",
"while elec_stream.has_more_instances():\n",
" instance = elec_stream.next_instance()\n",
"\n",
" prediction = moa_HAT.predict(instance)\n",
" hat_evaluator.update(instance.y_index, prediction)\n",
" moa_HAT.train(instance)\n",
" partial_fit_count += 1\n",
"\n",
"hat_evaluator.accuracy()"
],
"outputs": [
{
"data": {
"text/plain": [
"82.75"
"84.7"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 4
},
{
"attachments": {},
"cell_type": "markdown",
"id": "eb71a0ef-44ff-4168-b5dd-62530f74d112",
"metadata": {},
"source": "### Using SKClassifier\n"
},
{
"cell_type": "code",
"id": "1f457d2f-1b5d-4f41-a883-f11bc7a59814",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-26T01:24:57.155460Z",
"start_time": "2024-04-26T01:24:55.281047Z"
}
},
"source": [
"from sklearn import linear_model\n",
"from capymoa.base import SKClassifier\n",
Expand All @@ -285,63 +161,53 @@
" sklearn_SGD.train(instance)\n",
"\n",
"sklearn_SGD_evaluator.accuracy()"
],
"outputs": [
{
"data": {
"text/plain": [
"84.7"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 5
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a4ff1ac9-07a1-4a0b-9bb5-f2afa79dd928",
"metadata": {},
"source": "### Using prequential evaluation + SKClassifier"
"source": [
"## 3. Using prequential evaluation + SKClassifier\n",
"* Instead of an instance loop we use CapyMOA ```prequential_evaluation()``` in this example."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "da2bba35-c258-4fc0-8932-f97d56e4e276",
"metadata": {
"ExecuteTime": {
"end_time": "2024-04-26T01:25:46.918434Z",
"start_time": "2024-04-26T01:24:57.157654Z"
"end_time": "2024-04-29T11:35:26.365823Z",
"start_time": "2024-04-29T11:35:24.506697Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"84.7"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from capymoa.evaluation import prequential_evaluation\n",
"\n",
"## Opening a file as a stream\n",
"elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+\"electricity.csv\")\n",
"elec_stream = ElectricityTiny()\n",
"\n",
"# Creating a learner\n",
"sklearn_SGD = SKClassifier(schema=elec_stream.get_schema(), sklearner=linear_model.SGDClassifier())\n",
"\n",
"results_sklearn_SGD = prequential_evaluation(stream=elec_stream, learner=sklearn_SGD, window_size=4500)\n",
"\n",
"results_sklearn_SGD['cumulative'].accuracy()"
],
"outputs": [
{
"data": {
"text/plain": [
"83.88064971751412"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 6
]
}
],
"metadata": {
Expand Down
Loading

0 comments on commit 958456f

Please sign in to comment.