feature/add-getting-started-notebook #20

Masimba007 · 2023-06-29T14:27:38Z

No description provided.

Getting started/Getting-started.ipynb

mnicstruwig · 2023-06-30T09:12:01Z

Nit: Branch naming convention.

Co-authored-by: MichaelMeyer01 <105858978+MichaelMeyer01@users.noreply.github.com>

mnicstruwig · 2023-07-04T08:39:38Z

Something very weird is going on with this branch -- why does it not have the rest of the notebooks in it? Why wasn't it branched off of main?

Updating branch

Getting started/Getting-started.ipynb

mnicstruwig · 2023-07-04T09:44:05Z

Getting started/Getting-started.ipynb

+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAG3CAYAAABVDkpwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC0bElEQVR4nOydd3zU5B/HP3fXRemAlhZaCpS9h4Ai06JIWQqWLSgoQ5S9VByAKCJ7+GMIqKjIpsiUTdlbQfYso0BLW2hLS+ddfn/kkktyyY3S9tb3/Xr11cuTJ7knubvkk+/zHSqGYRgQBEEQBEG4MGpbD4AgCIIgCMLWkCAiCIIgCMLlIUFEEARBEITLQ4KIIAiCIAiXhwQRQRAEQRAuDwkigiAIgiBcHhJEBEEQBEG4PCSICIIgCIJweUgQEQRBEATh8pAgIgiC0PPHH3+gRo0acHd3R4kSJQAAERERiIiI4PvcuXMHKpUKK1asKLJxhYeHo3///kX2fgThipAgIgjCiFu3buGjjz5CpUqV4OXlBT8/PzRv3hzz589HZmamrYdXKFy9ehX9+/dH5cqVsWzZMixdutTibXfs2IHJkycX3uAIgih03Gw9AIIg7Ivt27eje/fu8PT0xPvvv486deogJycHR44cwfjx43Hp0iWrxIKjEBMTA51Oh/nz56NKlSp8++7du81uu2PHDixcuLDQRNG1a9egVtPzK0EUJiSICILgiY2NRa9evVChQgXs378fISEh/LqhQ4fi5s2b2L59uw1HWHg8fvwYAPipMg4PDw8bjAZgGAZZWVkoVqwYPD09bTIGgnAl6JGDIAieGTNmID09HT///LNIDHFUqVIFI0eOBGDal0alUomsJc+ePcOoUaMQHh4OT09PBAcH480338Q///zD91Hyk5H68MTExEClUmHdunX45ptvULZsWfj6+qJbt25ITU1FdnY2Ro0aheDgYPj4+OCDDz5Adna2yeMODw/HpEmTAABBQUGi8UvfX0r//v2xcOFC/ri5Pw6dTod58+ahdu3a8PLyQunSpfHRRx/h6dOnRmPo1KkTdu3ahcaNG6NYsWL46aefZM/NihUroFKpcPToUYwZMwZBQUEoXrw43nnnHSQmJor2q9PpMHnyZISGhsLb2xutW7fG5cuXyS+JICSQhYggCJ6tW7eiUqVKaNasWYHud8iQIdiwYQOGDRuGWrVqITk5GUeOHMGVK1fQsGHDfO1z2rRpKFasGD7//HPcvHkTP/74I9zd3aFWq/H06VNMnjwZJ06cwIoVK1CxYkVMnDhRcV/z5s3D77//jk2bNmHx4sXw8fFBvXr1LBrHRx99hIcPH2LPnj34448/ZNevWLECH3zwAUaMGIHY2Fj873//w7///oujR4/C3d2d73vt2jX07t0bH330EQYNGoTq1aubfO/hw4ejZMmSmDRpEu7cuYN58+Zh2LBhWLt2Ld9nwoQJmDFjBt566y1ERkbi/PnziIyMRFZWlkXHRxCuAgkigiAAAGlpaXjw4AE6d+5c4Pvevn07Bg0ahNmzZ/Ntn3766QvtMy8vDwcPHuQFRWJiItasWYN27dphx44dAIBPPvkEN2/exC+//GJSEHXp0gXnzp3Dpk2b0K1bN5QqVcricTRt2hTVqlXDnj170LdvX9G6I0eOYPny5fjzzz/x7rvv8u2tW7dGu3btsH79elH7zZs3sXPnTkRGRlr03oGBgdi9


Getting started/Getting-started.ipynb

mnicstruwig · 2023-07-04T09:46:40Z

I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree?

mnicstruwig · 2023-07-04T09:47:38Z

I also don't see a section on actually making predictions using our labels and training a model -- this is a crucial aspect to include in a Getting Started notebook.

mnicstruwig · 2023-07-04T09:49:48Z

And what about backtesting?

Remember -- the goal is to take the user from ticks ------> model prediction for a strategy.

MichaelMeyer01 · 2023-07-04T09:56:47Z

I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree?

@MichaelMeyer01 I think it's not that import to include, but it would be relatively straight forward to determine the average uniqueness which shows why sampling into finance can be tricky, without going into sampling.

Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>

mnicstruwig · 2023-07-04T11:17:29Z

I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree?

@MichaelMeyer01 I think it's not that import to include, but it would be relatively straight forward to determine the average uniqueness which shows why sampling into finance can be tricky, without going into sampling.

I do think it's important as well, just not relevant to a "Getting Started" guide. It's something we can cover in a separate notebook.

mnicstruwig · 2023-07-11T11:13:18Z

Since the notebook cells were executed without restarting the kernel, I've just re-run them so that comments can be made here relevant to the correct cell :)

mnicstruwig · 2023-07-11T11:47:12Z

[3] # Data cleaning and preparation

This should instead be a markdown header.

[3] # We are combining the 'date' and 'time' column and making it 'date'

Not a clear explanation. It's also not been converted to a DateTime, not sure if that's necessary for the functions below or not.

[Below 3]

We need to change our data format to ensure that our tools will read the data accordingly. Our documentation, for each function, will tell you the format the data should be in order to use the function. In this case, our data should be in [date_time, price, volume] format.

Explain that it's the columns, and a DataFrame. It's not just a "format". Be more explicit.

[*] Use backticks when talking about MLFinlab so that it renders it using monospace font:

`mlfinlab` # rather than bold *mlfinlab*

[6] # dollar bars

This comment isn't useful, since we have a good variable name.

[6] You should explain what the parameters represent.

[Above 7] This is the process of assigning target labels or categories to data points. It involves assigning a specific outcome or classification to each data instance based on certain criteria or rules. It is important when it comes to predictive modeling tasks. The following techniques are available in MlFinLab , accompanied with an example notebook in our documentation :

Remove what's in bold above.

There's also a formatting mistake in the list "Trend Scanning" doesn't have a space between the dash.

"fixed-time horizon method" . Please link an article about this here.

Use backticks when you're referring to code to render it as in-line code! https://www.markdownguide.org/basic-syntax/#code

[9] Explain what lookback parameter does

[Above 10] > Our documentation covers a lot more filters that can be used, for this notebook we will focus on the CUSUM Filter.

[Below 12] typo.

[13] Code formatting. Space missing after ,

[14] Bad code formatting. Indents aren't aligned.

[Below 14] Not using backticks again when referring to parameters.

[Below 15] You need to explain what those columns represent, not just what they are. What is a profit taking multiple? (for example).

[Above 16] > These are the meta-labels, and can also be used where you give the 'side' of the trade.

You'll need to explain what the meta-labels are, or how they're relevant. This concept hasn't been introduced yet, so I think the readers will be a little bit confused.

[21] Now we multiply the mean volatility by 2, where previously we didn't. Why?

Please remove the sampling section -- it's been raised as an issue in other comments on this PR.

mnicstruwig · 2023-07-11T11:58:01Z

[Above 43] Starting from MlFinLab version 1.5.0 the execution is up to 10 times faster compared to the models from version 1.4.0 and earlier. (The speed improvement depends on the size of the input dataset)

Remove. Irrelevant.

[43] # We will use a subset of the data for illustration purposes

[45] > The minimum d inm this case is 0.2.

Why?

[Above 51] Explosive tests are just done, without any explanation or motivation.

Do we even use the structural breaks anywhere?

[57] Volatility estimators are very useful. But do we use this anywhere?

[Clustering section] How do we use clustering? For what? How to we incorporate it into our models?

[Feature Importance] Code is just copy-pasted from the examples without any explanation.

[Cross validation] Just copy-pasted from the documentation. Not useful if this is the case -- you need to put it in context of a getting started guide! Obvious tell: Why are we mentioning the book here?

Examples are also just taken from the page without any additional explanation added.

[Bet concentration] Grammar.

[Drawdown and time underwater]: Spelling and grammar.

[84] I don't think we use cumulated_return or annual_return anywhere else. Did you forget to print them out?

[86] Bad code formatting.

[87] Bad code formatting.

mnicstruwig

Left comments.

docs/getting-started-notebook

e9b5188