Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/add-getting-started-notebook #20

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

Masimba007
Copy link
Contributor

No description provided.

@mnicstruwig
Copy link
Contributor

Nit: Branch naming convention.

Masimba007 and others added 2 commits June 30, 2023 11:43
Co-authored-by: MichaelMeyer01 <105858978+MichaelMeyer01@users.noreply.github.com>
@Masimba007 Masimba007 changed the title docs/getting-started-notebook feature/add-getting-started-notebook Jun 30, 2023
@mnicstruwig
Copy link
Contributor

Something very weird is going on with this branch -- why does it not have the rest of the notebooks in it? Why wasn't it branched off of main?

"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAG3CAYAAABVDkpwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC0bElEQVR4nOydd3zU5B/HP3fXRemAlhZaCpS9h4Ai06JIWQqWLSgoQ5S9VByAKCJ7+GMIqKjIpsiUTdlbQfYso0BLW2hLS+ddfn/kkktyyY3S9tb3/Xr11cuTJ7knubvkk+/zHSqGYRgQBEEQBEG4MGpbD4AgCIIgCMLWkCAiCIIgCMLlIUFEEARBEITLQ4KIIAiCIAiXhwQRQRAEQRAuDwkigiAIgiBcHhJEBEEQBEG4PCSICIIgCIJweUgQEQRBEATh8pAgIgiC0PPHH3+gRo0acHd3R4kSJQAAERERiIiI4PvcuXMHKpUKK1asKLJxhYeHo3///kX2fgThipAgIgjCiFu3buGjjz5CpUqV4OXlBT8/PzRv3hzz589HZmamrYdXKFy9ehX9+/dH5cqVsWzZMixdutTibXfs2IHJkycX3uAIgih03Gw9AIIg7Ivt27eje/fu8PT0xPvvv486deogJycHR44cwfjx43Hp0iWrxIKjEBMTA51Oh/nz56NKlSp8++7du81uu2PHDixcuLDQRNG1a9egVtPzK0EUJiSICILgiY2NRa9evVChQgXs378fISEh/LqhQ4fi5s2b2L59uw1HWHg8fvwYAPipMg4PDw8bjAZgGAZZWVkoVqwYPD09bTIGgnAl6JGDIAieGTNmID09HT///LNIDHFUqVIFI0eOBGDal0alUomsJc+ePcOoUaMQHh4OT09PBAcH480338Q///zD91Hyk5H68MTExEClUmHdunX45ptvULZsWfj6+qJbt25ITU1FdnY2Ro0aheDgYPj4+OCDDz5Adna2yeMODw/HpEmTAABBQUGi8UvfX0r//v2xcOFC/ri5Pw6dTod58+ahdu3a8PLyQunSpfHRRx/h6dOnRmPo1KkTdu3ahcaNG6NYsWL46aefZM/NihUroFKpcPToUYwZMwZBQUEoXrw43nnnHSQmJor2q9PpMHnyZISGhsLb2xutW7fG5cuXyS+JICSQhYggCJ6tW7eiUqVKaNasWYHud8iQIdiwYQOGDRuGWrVqITk5GUeOHMGVK1fQsGHDfO1z2rRpKFasGD7//HPcvHkTP/74I9zd3aFWq/H06VNMnjwZJ06cwIoVK1CxYkVMnDhRcV/z5s3D77//jk2bNmHx4sXw8fFBvXr1LBrHRx99hIcPH2LPnj34448/ZNevWLECH3zwAUaMGIHY2Fj873//w7///oujR4/C3d2d73vt2jX07t0bH330EQYNGoTq1aubfO/hw4ejZMmSmDRpEu7cuYN58+Zh2LBhWLt2Ld9nwoQJmDFjBt566y1ERkbi/PnziIyMRFZWlkXHRxCuAgkigiAAAGlpaXjw4AE6d+5c4Pvevn07Bg0ahNmzZ/Ntn3766QvtMy8vDwcPHuQFRWJiItasWYN27dphx44dAIBPPvkEN2/exC+//GJSEHXp0gXnzp3Dpk2b0K1bN5QqVcricTRt2hTVqlXDnj170LdvX9G6I0eOYPny5fjzzz/x7rvv8u2tW7dGu3btsH79elH7zZs3sXPnTkRGRlr03oGBgdi9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling.

@mnicstruwig
Copy link
Contributor

I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree?

@mnicstruwig
Copy link
Contributor

I also don't see a section on actually making predictions using our labels and training a model -- this is a crucial aspect to include in a Getting Started notebook.

@mnicstruwig
Copy link
Contributor

And what about backtesting?

Remember -- the goal is to take the user from ticks ------> model prediction for a strategy.

@MichaelMeyer01
Copy link

I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree?

@MichaelMeyer01 I think it's not that import to include, but it would be relatively straight forward to determine the average uniqueness which shows why sampling into finance can be tricky, without going into sampling.

@MichaelMeyer01 MichaelMeyer01 reopened this Jul 4, 2023
Masimba007 and others added 9 commits July 4, 2023 12:11
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
@mnicstruwig
Copy link
Contributor

I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree?

@MichaelMeyer01 I think it's not that import to include, but it would be relatively straight forward to determine the average uniqueness which shows why sampling into finance can be tricky, without going into sampling.

I do think it's important as well, just not relevant to a "Getting Started" guide. It's something we can cover in a separate notebook.

@mnicstruwig
Copy link
Contributor

Since the notebook cells were executed without restarting the kernel, I've just re-run them so that comments can be made here relevant to the correct cell :)

@mnicstruwig
Copy link
Contributor

[3] # Data cleaning and preparation

This should instead be a markdown header.

[3] # We are combining the 'date' and 'time' column and making it 'date'

Not a clear explanation. It's also not been converted to a DateTime, not sure if that's necessary for the functions below or not.

[Below 3]

We need to change our data format to ensure that our tools will read the data accordingly. Our documentation, for each function, will tell you the format the data should be in order to use the function. In this case, our data should be in [date_time, price, volume] format.

Explain that it's the columns, and a DataFrame. It's not just a "format". Be more explicit.

[*] Use backticks when talking about MLFinlab so that it renders it using monospace font:

`mlfinlab` # rather than bold *mlfinlab*

[6] # dollar bars

This comment isn't useful, since we have a good variable name.

[6] You should explain what the parameters represent.

[Above 7] This is the process of assigning target labels or categories to data points. It involves assigning a specific outcome or classification to each data instance based on certain criteria or rules. It is important when it comes to predictive modeling tasks. The following techniques are available in MlFinLab , accompanied with an example notebook in our documentation :

Remove what's in bold above.

There's also a formatting mistake in the list "Trend Scanning" doesn't have a space between the dash.

"fixed-time horizon method" . Please link an article about this here.

Use backticks when you're referring to code to render it as in-line code! https://www.markdownguide.org/basic-syntax/#code

[9] Explain what lookback parameter does

[Above 10] > Our documentation covers a lot more filters that can be used, for this notebook we will focus on the CUSUM Filter.

[Below 12] typo.

[13] Code formatting. Space missing after ,

[14] Bad code formatting. Indents aren't aligned.

[Below 14] Not using backticks again when referring to parameters.

[Below 15] You need to explain what those columns represent, not just what they are. What is a profit taking multiple? (for example).

[Above 16] > These are the meta-labels, and can also be used where you give the 'side' of the trade.

You'll need to explain what the meta-labels are, or how they're relevant. This concept hasn't been introduced yet, so I think the readers will be a little bit confused.

[21] Now we multiply the mean volatility by 2, where previously we didn't. Why?

Please remove the sampling section -- it's been raised as an issue in other comments on this PR.

@mnicstruwig
Copy link
Contributor

mnicstruwig commented Jul 11, 2023

[Above 43] Starting from MlFinLab version 1.5.0 the execution is up to 10 times faster compared to the models from version 1.4.0 and earlier. (The speed improvement depends on the size of the input dataset)

Remove. Irrelevant.

[43] # We will use a subset of the data for illustration purposes

[45] > The minimum d inm this case is 0.2.

Why?

[Above 51] Explosive tests are just done, without any explanation or motivation.

Do we even use the structural breaks anywhere?

[57] Volatility estimators are very useful. But do we use this anywhere?

[Clustering section] How do we use clustering? For what? How to we incorporate it into our models?

[Feature Importance] Code is just copy-pasted from the examples without any explanation.

[Cross validation] Just copy-pasted from the documentation. Not useful if this is the case -- you need to put it in context of a getting started guide! Obvious tell: Why are we mentioning the book here?

Examples are also just taken from the page without any additional explanation added.

[Bet concentration] Grammar.

[Drawdown and time underwater]: Spelling and grammar.

[84] I don't think we use cumulated_return or annual_return anywhere else. Did you forget to print them out?

[86] Bad code formatting.

[87] Bad code formatting.

Copy link
Contributor

@mnicstruwig mnicstruwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants