-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/add-getting-started-notebook #20
base: main
Are you sure you want to change the base?
Conversation
Nit: Branch naming convention. |
Co-authored-by: MichaelMeyer01 <105858978+MichaelMeyer01@users.noreply.github.com>
Something very weird is going on with this branch -- why does it not have the rest of the notebooks in it? Why wasn't it branched off of |
Updating branch
"outputs": [ | ||
{ | ||
"data": { | ||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAG3CAYAAABVDkpwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC0bElEQVR4nOydd3zU5B/HP3fXRemAlhZaCpS9h4Ai06JIWQqWLSgoQ5S9VByAKCJ7+GMIqKjIpsiUTdlbQfYso0BLW2hLS+ddfn/kkktyyY3S9tb3/Xr11cuTJ7knubvkk+/zHSqGYRgQBEEQBEG4MGpbD4AgCIIgCMLWkCAiCIIgCMLlIUFEEARBEITLQ4KIIAiCIAiXhwQRQRAEQRAuDwkigiAIgiBcHhJEBEEQBEG4PCSICIIgCIJweUgQEQRBEATh8pAgIgiC0PPHH3+gRo0acHd3R4kSJQAAERERiIiI4PvcuXMHKpUKK1asKLJxhYeHo3///kX2fgThipAgIgjCiFu3buGjjz5CpUqV4OXlBT8/PzRv3hzz589HZmamrYdXKFy9ehX9+/dH5cqVsWzZMixdutTibXfs2IHJkycX3uAIgih03Gw9AIIg7Ivt27eje/fu8PT0xPvvv486deogJycHR44cwfjx43Hp0iWrxIKjEBMTA51Oh/nz56NKlSp8++7du81uu2PHDixcuLDQRNG1a9egVtPzK0EUJiSICILgiY2NRa9evVChQgXs378fISEh/LqhQ4fi5s2b2L59uw1HWHg8fvwYAPipMg4PDw8bjAZgGAZZWVkoVqwYPD09bTIGgnAl6JGDIAieGTNmID09HT///LNIDHFUqVIFI0eOBGDal0alUomsJc+ePcOoUaMQHh4OT09PBAcH480338Q///zD91Hyk5H68MTExEClUmHdunX45ptvULZsWfj6+qJbt25ITU1FdnY2Ro0aheDgYPj4+OCDDz5Adna2yeMODw/HpEmTAABBQUGi8UvfX0r//v2xcOFC/ri5Pw6dTod58+ahdu3a8PLyQunSpfHRRx/h6dOnRmPo1KkTdu3ahcaNG6NYsWL46aefZM/NihUroFKpcPToUYwZMwZBQUEoXrw43nnnHSQmJor2q9PpMHnyZISGhsLb2xutW7fG5cuXyS+JICSQhYggCJ6tW7eiUqVKaNasWYHud8iQIdiwYQOGDRuGWrVqITk5GUeOHMGVK1fQsGHDfO1z2rRpKFasGD7//HPcvHkTP/74I9zd3aFWq/H06VNMnjwZJ06cwIoVK1CxYkVMnDhRcV/z5s3D77//jk2bNmHx4sXw8fFBvXr1LBrHRx99hIcPH2LPnj34448/ZNevWLECH3zwAUaMGIHY2Fj873//w7///oujR4/C3d2d73vt2jX07t0bH330EQYNGoTq1aubfO/hw4ejZMmSmDRpEu7cuYN58+Zh2LBhWLt2Ld9nwoQJmDFjBt566y1ERkbi/PnziIyMRFZWlkXHRxCuAgkigiAAAGlpaXjw4AE6d+5c4Pvevn07Bg0ahNmzZ/Ntn3766QvtMy8vDwcPHuQFRWJiItasWYN27dphx44dAIBPPvkEN2/exC+//GJSEHXp0gXnzp3Dpk2b0K1bN5QqVcricTRt2hTVqlXDnj170LdvX9G6I0eOYPny5fjzzz/x7rvv8u2tW7dGu3btsH79elH7zZs3sXPnTkRGRlr03oGBgdi9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling.
I don't think Sampling has a home in a getting started notebook -- it's something the user can investigate on their own. @MichaelMeyer01 do you agree? |
I also don't see a section on actually making predictions using our labels and training a model -- this is a crucial aspect to include in a Getting Started notebook. |
And what about backtesting? Remember -- the goal is to take the user from ticks ------> model prediction for a strategy. |
@MichaelMeyer01 I think it's not that import to include, but it would be relatively straight forward to determine the average uniqueness which shows why sampling into finance can be tricky, without going into sampling. |
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
Co-authored-by: Michael Struwig <michael.struwig@hudsonthames.org>
I do think it's important as well, just not relevant to a "Getting Started" guide. It's something we can cover in a separate notebook. |
Since the notebook cells were executed without restarting the kernel, I've just re-run them so that comments can be made here relevant to the correct cell :) |
[3] # Data cleaning and preparation This should instead be a markdown header. [3] # We are combining the 'date' and 'time' column and making it 'date' Not a clear explanation. It's also not been converted to a [Below 3]
Explain that it's the columns, and a DataFrame. It's not just a "format". Be more explicit. [*] Use backticks when talking about MLFinlab so that it renders it using monospace font:
[6] # dollar bars This comment isn't useful, since we have a good variable name. [6] You should explain what the parameters represent. [Above 7] This is the process of assigning target labels or categories to data points. It involves assigning a specific outcome or classification to each data instance based on certain criteria or rules. It is important when it comes to predictive modeling tasks. The following techniques are available in MlFinLab , accompanied with an example notebook in our documentation : Remove what's in bold above. There's also a formatting mistake in the list "Trend Scanning" doesn't have a space between the dash.
Use backticks when you're referring to code to render it as in-line code! https://www.markdownguide.org/basic-syntax/#code [9] Explain what [Above 10] > Our documentation covers a lot more filters that can be used, for this notebook we will focus on the CUSUM Filter. [Below 12] typo. [13] Code formatting. Space missing after [14] Bad code formatting. Indents aren't aligned. [Below 14] Not using backticks again when referring to parameters. [Below 15] You need to explain what those columns represent, not just what they are. What is a profit taking multiple? (for example). [Above 16] > These are the meta-labels, and can also be used where you give the 'side' of the trade. You'll need to explain what the meta-labels are, or how they're relevant. This concept hasn't been introduced yet, so I think the readers will be a little bit confused. [21] Now we multiply the mean volatility by 2, where previously we didn't. Why? Please remove the sampling section -- it's been raised as an issue in other comments on this PR. |
[Above 43] Starting from MlFinLab version 1.5.0 the execution is up to 10 times faster compared to the models from version 1.4.0 and earlier. (The speed improvement depends on the size of the input dataset) Remove. Irrelevant. [43] # We will use a subset of the data for illustration purposes [45] > The minimum d inm this case is 0.2. Why? [Above 51] Explosive tests are just done, without any explanation or motivation. Do we even use the structural breaks anywhere? [57] Volatility estimators are very useful. But do we use this anywhere? [Clustering section] How do we use clustering? For what? How to we incorporate it into our models? [Feature Importance] Code is just copy-pasted from the examples without any explanation. [Cross validation] Just copy-pasted from the documentation. Not useful if this is the case -- you need to put it in context of a getting started guide! Obvious tell: Why are we mentioning the book here? Examples are also just taken from the page without any additional explanation added. [Bet concentration] Grammar. [Drawdown and time underwater]: Spelling and grammar. [84] I don't think we use [86] Bad code formatting. [87] Bad code formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comments.
No description provided.