Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw_to_Xy raises KeyError #118

Closed
AlexKnowsIt opened this issue Aug 12, 2021 · 7 comments
Closed

raw_to_Xy raises KeyError #118

AlexKnowsIt opened this issue Aug 12, 2021 · 7 comments

Comments

@AlexKnowsIt
Copy link

AlexKnowsIt commented Aug 12, 2021

Hello,
I am running into a weird error, that I don't understand. I used Yahoo Finance to download a DataFrame with the Index of
MultiIndex([( 'SIE.DE', 'Open'),
( 'SIE.DE', 'High'),
( 'SIE.DE', 'Low'),
( 'SIE.DE', 'Close'),
( 'SIE.DE', 'Volume'),
('EOAN.DE', 'Open'),
('EOAN.DE', 'High'),
('EOAN.DE', 'Low'),
('EOAN.DE', 'Close'),
('EOAN.DE', 'Volume'),
...
('BAYN.DE', 'Open'),
('BAYN.DE', 'High'),
('BAYN.DE', 'Low'),
('BAYN.DE', 'Close'),
('BAYN.DE', 'Volume'),
( 'BEI.DE', 'Open'),
( 'BEI.DE', 'High'),
( 'BEI.DE', 'Low'),
( 'BEI.DE', 'Close'),
( 'BEI.DE', 'Volume')],
length=150)
So I have a DataFrame, where there are the Assets in the zero level and the first level being indecators. If i run now the raw_to_Xy Function like this

n_timesteps = len(raw_df)
n_channels = len(raw_df.columns.levels[0]) 
n_assets = len(raw_df.columns.levels[1]) 

lookback, gap, horizon = 14, 1, 2

X, timestamps, y, asset_names, indicators = raw_to_Xy(raw_df,
                                                      lookback=lookback,
                                                      gap=gap,
                                                      freq="B",
                                                      horizon=horizon)

I receive an KeyError: 'Open' with the explanation

The label is present in self.levels[level] but unused

I tried to stay as close to the documentation as possible and don't see, where I am wrong.
Other than that: Great repository, I am really fascinated by all the possibilties that this gives for modern Portfolio Optimization!

@AlexKnowsIt
Copy link
Author

This is the whole Error-message:

/usr/local/lib/python3.7/dist-packages/deepdow/utils.py in raw_to_Xy(raw_data, lookback, horizon, gap, freq, included_assets, included_indicators, use_log)
    275     y_list = []
    276     for ind in indicators:
--> 277         X, timestamps, y = (returns_to_Xy(returns.xs(ind, axis=1, level=1),
    278                                           lookback=lookback,
    279                                           horizon=horizon,

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3471             if not isinstance(labels, MultiIndex):
   3472                 raise TypeError("Index must be a MultiIndex")
-> 3473             loc, new_ax = labels.get_loc_level(key, level=level, drop_level=drop_level)
   3474 
   3475             # create the tuple of the indexer

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/multi.py in get_loc_level(self, key, level, drop_level)
   2883                 return indexer, maybe_mi_droplevels(indexer, ilevels, drop_level)
   2884         else:
-> 2885             indexer = self._get_level_indexer(key, level=level)
   2886             return indexer, maybe_mi_droplevels(indexer, [level], drop_level)
   2887 

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/multi.py in _get_level_indexer(self, key, level, indexer)
   2974                 if not locs.any():
   2975                     # The label is present in self.levels[level] but unused:
-> 2976                     raise KeyError(key)
   2977                 return locs
   2978 

KeyError: 'Open'

@jankrepl
Copy link
Owner

Hey! Thank you for your interest!

Hmmm, I am not sure where it comes from. Could you please upload your raw_df here? Ideally just a minimal version of it that reproduces the error.

Cheers

@AlexKnowsIt
Copy link
Author

Hello Jan,

thanks for helping me out. I am not sure if this is the best way to share my DataFrame, if you need it in a different format just tell me. Its the result of the yfinance package with the function

mein_portfolio = yf.download(tickers = ticker,
          start="2010-01-01", 
          end="2021-01-01",
          interval = "1d",

          # group by ticker (to access via data['SPY'])
          # (optional, default is 'column')
          group_by = 'ticker',
      )
<!--StartFragment-->

  | 1COV.DE | MRK.DE | LIN.DE | FRE.DE | DWNI.DE | BEI.DE | MUV2.DE | DPW.DE | ... | BAS.DE | VOW3.DE | SIE.DE | CON.DE | VNA.DE | RWE.DE | HEN3.DE | ALV.DE
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --


NaN | NaN | NaN | NaN | NaN | 27.215351 | 27.428233 | 27.077607 | 27.219526 | 752504.0 | 39.533226 | 39.533226 | 39.533226 | 39.533226 | 0.0 | 12.068662 | 12.510301 | 12.068662 | 12.393932 | 96522.0 | 4.828712 | 5.049168 | 4.800266 | 5.006499 | 193278.0 | 39.338344 | 39.998988 | 39.295446 | 39.848843 | 336203.0 | 62.419827 | 62.733922 | 61.677412 | 61.848740 | 873193.0 | 8.497876 | 8.767550 | 8.497876 | 8.767550 | 3020790.0 | ... | 25.048560 | 25.849699 | 24.985161 | 25.849699 | 3245626.0 | 49.865665 | 50.309594 | 48.564486 | 49.368153 | 906136.0 | 41.763412 | 42.234259 | 41.666662 | 42.227810 | 2734897.0 | 27.314565 | 27.993757 | 27.171385 | 27.993757 | 133257.0 | NaN | NaN | NaN | NaN | NaN | 39.725997 | 40.033593 | 39.638945 | 40.021984 | 2022078.0 | 30.165420 | 30.165420 | 29.641154 | 29.842794 | 784200.0 | 50.129575 | 50.751911 | 50.015382 | 50.552078 | 1676777.0
NaN | NaN | NaN | NaN | NaN | 27.169440 | 27.202834 | 26.873077 | 27.019175 | 488178.0 | NaN | NaN | NaN | NaN | NaN | 12.339255 | 12.410758 | 12.050434 | 12.109323 | 137034.0 | 5.120282 | 5.153706 | 5.042056 | 5.042056 | 279601.0 | 39.831689 | 40.024737 | 39.522816 | 39.638645 | 331772.0 | 61.763061 | 62.020048 | 61.563181 | 61.962940 | 797939.0 | 8.739330 | 8.864759 | 8.673479 | 8.780094 | 2735783.0 | ... | 25.800709 | 25.820882 | 25.293508 | 25.457775 | 3292371.0 | 49.368148 | 49.666657 | 47.592428 | 48.411404 | 893352.0 | 42.182657 | 42.498704 | 41.911759 | 42.253605 | 2584717.0 | 27.894637 | 30.185536 | 27.843238 | 29.744978 | 420128.0 | NaN | NaN | NaN | NaN | NaN | 40.045200 | 40.213507 | 39.540283 | 39.766628 | 2040667.0 | 29.850860 | 29.875059 | 29.003970 | 29.318529 | 1147700.0 | 50.529229 | 51.208663 | 50.415040 | 50.706223 | 1742206.0
NaN | NaN | NaN | NaN | NaN | 27.048393 | 27.219531 | 26.772899 | 27.002478 | 511138.0 | 39.589531 | 39.589531 | 39.589531 | 39.589531 | 0.0 | 12.222885 | 12.222885 | 11.964912 | 12.046229 | 77268.0 | 5.034945 | 5.148729 | 5.034945 | 5.034945 | 98025.0 | 39.681540 | 39.810238 | 39.595743 | 39.608612 | 342784.0 | 62.105725 | 62.191389 | 61.363310 | 61.563194 | 727120.0 | 8.767549 | 8.924336 | 8.733056 | 8.877300 | 2595800.0 | ... | 25.486592 | 25.754599 | 25.379965 | 25.619154 | 2345128.0 | 48.671639 | 49.896274 | 47.929201 | 49.750847 | 1202268.0 | 42.214906 | 42.479355 | 42.079458 | 42.408405 | 2166413.0 | 29.744973 | 30.086398 | 29.047423 | 29.939554 | 231260.0 | NaN | NaN | NaN | NaN | NaN | 39.714390 | 39.934928 | 39.464832 | 39.691177 | 1535087.0 | 29.439508 | 29.737935 | 29.286262 | 29.576622 | 993200.0 | 50.900349 | 51.265758 | 50.671968 | 51.100182 | 1362139.0
NaN | NaN | NaN | NaN | NaN | 26.960732 | 27.624421 | 26.960732 | 27.570158 | 1088580.0 | 39.617683 | 39.617683 | 39.617683 | 39.617683 | 0.0 | 12.057447 | 12.079879 | 11.917244 | 12.029407 | 74370.0 | 5.034944 | 5.089703 | 4.999387 | 5.047034 | 115634.0 | 39.625772 | 39.681540 | 38.960837 | 39.209652 | 481551.0 | 61.334759 | 61.763075 | 61.134880 | 61.705967 | 948916.0 | 8.849081 | 8.902388 | 8.792638 | 8.858488 | 2907679.0 | ... | 25.457778 | 25.578815 | 25.218591 | 25.449133 | 2505352.0 | 49.727887 | 50.753518 | 49.314570 | 50.324898 | 1108024.0 | 42.376153 | 43.085651 | 42.034307 | 42.911499 | 3336865.0 | 31.279587 | 33.915587 | 31.261229 | 33.776077 | 1470941.0 | NaN | NaN | NaN | NaN | NaN | 39.551892 | 39.685378 | 39.197872 | 39.522873 | 1765257.0 | 29.383050 | 29.471772 | 28.778129 | 29.124950 | 877300.0 | 50.769027 | 51.065924 | 50.443584 | 50.512100 | 1926360.0
NaN | NaN | NaN | NaN | NaN | 27.570161 | 27.707905 | 26.964909 | 27.290491 | 730274.0 | NaN | NaN | NaN | NaN | NaN | 12.141570 | 12.149982 | 11.777042 | 11.777042 | 74403.0 | 5.042056 | 5.081880 | 5.034944 | 5.056279 | 90482.0 | 39.252552 | 39.484207 | 37.871205 | 38.077122 | 978115.0 | 62.020063 | 62.020063 | 61.020660 | 61.591747 | 902151.0 | 8.867895 | 9.109347 | 8.817723 | 9.068583 | 5462896.0 | ... | 25.503885 | 25.610512 | 25.126371 | 25.371323 | 3307737.0 | 50.439708 | 51.358185 | 50.294281 | 50.868328 | 937837.0 | 43.117893 | 43.362992 | 42.685749 | 43.182396 | 2634386.0 | 33.812793 | 34.756320 | 32.718743 | 34.400204 | 449586.0 | NaN | NaN | NaN | NaN | NaN | 39.673771 | 39.784040 | 39.041174 | 39.464836 | 1665122.0 | 29.116886 | 29.358857 | 28.447441 | 28.818459 | 1224500.0 | 50.917475 | 50.917475 | 49.490096 | 50.238041 | 2425290.0
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...
48.683556 | 49.280001 | 48.556444 | 48.800888 | 544896.0 | 134.279385 | 135.566712 | 134.031820 | 134.972565 | 193144.0 | 204.061152 | 206.339508 | 203.367744 | 206.339508 | 751699.0 | 36.296987 | 37.071978 | 36.159648 | 36.738438 | 1330030.0 | 41.357513 | 41.710410 | 41.220276 | 41.543766 | 569336.0 | 90.810540 | 92.596589 | 90.592251 | 92.358452 | 327184.0 | 227.299843 | 229.993200 | 226.626507 | 228.165558 | 310667.0 | 39.068311 | 39.233857 | 38.834605 | 38.951458 | 2387989.0 | ... | 60.852591 | 61.519835 | 60.661949 | 60.862122 | 2923353.0 | 146.430482 | 147.387668 | 145.023997 | 145.434219 | 952157.0 | 111.475366 | 111.884555 | 110.695958 | 111.397423 | 1390621.0 | 117.800003 | 118.900002 | 117.500000 | 118.400002 | 325628.0 | 55.863411 | 56.387859 | 55.669170 | 56.154770 | 1031102.0 | 31.953356 | 32.382001 | 31.846195 | 32.372257 | 1600706.0 | 87.669159 | 88.728963 | 87.551408 | 88.454201 | 329138.0 | 186.491139 | 188.844513 | 185.936275 | 186.759003 | 969429.0
48.986666 | 49.553779 | 48.986666 | 49.240891 | 615538.0 | 134.972562 | 136.012339 | 134.378399 | 135.071579 | 202684.0 | 207.726336 | 209.113168 | 206.042339 | 208.617874 | 549094.0 | 36.895399 | 37.699818 | 36.846350 | 37.483997 | 1204434.0 | 41.465345 | 41.877059 | 41.435938 | 41.739822 | 380069.0 | 92.517205 | 93.053019 | 92.159995 | 92.457672 | 231334.0 | 228.454137 | 233.071319 | 228.357940 | 232.590363 | 246062.0 | 39.146216 | 39.185168 | 38.883293 | 38.951458 | 2036098.0 | ... | 60.976510 | 61.939249 | 60.966976 | 61.662819 | 1802073.0 | 147.250934 | 149.575542 | 146.801649 | 148.911377 | 944403.0 | 111.904042 | 113.988968 | 111.670221 | 113.988968 | 1149343.0 | 118.949997 | 122.199997 | 118.550003 | 121.650002 | 471544.0 | 56.154770 | 56.873459 | 56.154770 | 56.659794 | 603944.0 | 32.537871 | 33.044445 | 32.440450 | 32.927544 | 1523201.0 | 88.552333 | 88.846725 | 87.963555 | 88.787849 | 212154.0 | 186.567662 | 191.666657 | 186.548525 | 191.217026 | 987162.0
49.788442 | 49.827554 | 49.201777 | 49.299553 | 439192.0 | 135.319144 | 137.497730 | 135.319144 | 136.210388 | 155267.0 | 211.193396 | 213.967046 | 209.212220 | 211.688690 | 692211.0 | 37.866585 | 38.298227 | 37.837156 | 37.915638 | 968546.0 | 42.151532 | 42.680877 | 41.779029 | 42.396599 | 366515.0 | 93.072874 | 94.422332 | 92.933960 | 94.104805 | 190535.0 | 235.764677 | 236.245633 | 232.975123 | 234.802765 | 225114.0 | 40.022620 | 40.158953 | 39.691533 | 39.964195 | 2223149.0 | ... | 62.148946 | 62.673212 | 62.120351 | 62.168015 | 2008445.0 | 152.329905 | 152.408030 | 149.204376 | 149.204376 | 686060.0 | 115.352942 | 116.327207 | 115.274999 | 116.229782 | 1067229.0 | 123.000000 | 124.150002 | 122.400002 | 123.050003 | 321404.0 | 57.028850 | 57.514450 | 56.601520 | 57.456177 | 602391.0 | 33.122383 | 34.116055 | 33.122383 | 34.116055 | 1845090.0 | 89.474755 | 90.671940 | 89.298121 | 90.299042 | 353813.0 | 193.053823 | 195.062798 | 192.814658 | 193.819153 | 738979.0
49.573336 | 50.003557 | 49.475556 | 49.592892 | 662659.0 | 137.151130 | 138.042358 | 136.507466 | 136.903564 | 151454.0 | 213.967040 | 214.066105 | 210.400916 | 211.193390 | 508977.0 | 38.131456 | 38.200126 | 37.582098 | 37.601719 | 864433.0 | 42.553442 | 43.082788 | 42.426007 | 42.710285 | 457634.0 | 94.700160 | 95.176435 | 94.263568 | 94.660469 | 193822.0 | 235.668482 | 237.592307 | 233.937037 | 234.514191 | 258539.0 | 40.266066 | 40.314758 | 39.516253 | 39.662319 | 1880277.0 | ... | 62.663677 | 62.844789 | 61.824860 | 62.025032 | 2112445.0 | 150.024838 | 151.079687 | 147.797908 | 148.364395 | 738894.0 | 116.327207 | 116.658453 | 114.144856 | 114.378677 | 902680.0 | 123.800003 | 123.949997 | 122.050003 | 122.750000 | 388663.0 | 57.611568 | 58.388527 | 57.261936 | 57.825233 | 668538.0 | 34.193985 | 34.505725 | 33.960182 | 34.047859 | 1677246.0 | 90.671937 | 91.084080 | 90.338291 | 90.730812 | 245532.0 | 194.536643 | 195.445466 | 192.814654 | 193.197311 | 665400.0
49.475555 | 49.768892 | 49.358223 | 49.358223 | 339126.0 | 137.497732 | 138.983124 | 136.953086 | 138.983124 | 166548.0 | 210.995285 | 211.193400 | 209.410338 | 210.995285 | 340269.0 | 37.523235 | 37.572284 | 36.856155 | 37.121025 | 1072853.0 | 42.690678 | 42.994563 | 42.641666 | 42.827915 | 266469.0 | 94.541393 | 94.660466 | 93.707909 | 93.707909 | 121069.0 | 234.321809 | 236.245634 | 233.552277 | 233.552277 | 165429.0 | 39.633109 | 39.788914 | 39.428614 | 39.438351 | 1227197.0 | ... | 62.339593 | 62.339593 | 61.548432 | 61.691414 | 1144797.0 | 148.090917 | 148.872299 | 146.489090 | 148.872299 | 578887.0 | 114.144854 | 115.274997 | 114.086393 | 114.495583 | 796354.0 | 122.400002 | 126.500000 | 121.250000 | 121.250000 | 464126.0 | 57.786384 | 58.116592 | 57.669841 | 58.038895 | 505759.0 | 33.969921 | 34.155019 | 33.599733 | 33.677666 | 1036749.0 | 90.887829 | 91.064463 | 90.338297 | 90.573814 | 159727.0 | 192.527646 | 194.967129 | 192.001480 | 192.001480 | 526249.0


<p>2792 rows × 150 columns</p><!--EndFragment-->

@jankrepl
Copy link
Owner

jankrepl commented Aug 14, 2021

So the problem is that raw_to_Xy does not support zero (and negative) entries inside of raw_df. Unfortunately, your DataFrame does have a couple of zero entries for the volumes (which I guess is not impossible).

is_valid = np.all(np.isfinite(new[a])) and np.all(new[a] > 0)

The reason why 0's are not supported is because one then computes per day percentage changes which would lead to division by zero.

I guess the simplest way around this is to replace 0's with NAN's and that way the raw_to_Xy will simply forward fill those values.

from deepdow.utils import raw_to_Xy
import numpy as np
import yfinance as yf

tickers = [
    "1COV.DE",
    "MRK.DE",
    "LIN.DE",
    "FRE.DE",
    "DWNI.DE",
    "BEI.DE",
    "MUV2.DE",
    "DPW.DE",
    "BAS.DE",
    "VOW3.DE",
    "SIE.DE",
    "CON.DE",
    "VNA.DE",
    "RWE.DE",
    "HEN3.DE",
    "ALV.DE",
]

raw_df = yf.download(tickers = tickers,
          start="2010-01-01",
          end="2021-01-01",
          interval = "1d",
          group_by = 'ticker',
      )

print((raw_df == 0).sum().sum())
raw_df_adj = raw_df.replace(0, np.nan)
print((raw_df_adj == 0).sum().sum())

lookback, gap, horizon = 5, 2, 4

X, timestamps, y, asset_names, indicators = raw_to_Xy(raw_df_adj,
                                                      lookback=lookback,
                                                      gap=gap,
                                                      freq="B",
                                                      horizon=horizon)

Standard output:

[*********************100%***********************]  16 of 16 completed
223
0

Anyway, maybe deepdow should raise an exception to let the user know that none of the assets made it through the check. The exception you saw in your traceback is coming from pandas because things do not work out evenutally (asset_names=[] in the code).

@AlexKnowsIt
Copy link
Author

Ah, that make sense! The code is running now, thanks!
Another short question because it seems my problems are directly related to #30:
My n_sample size is not n_timesteps - lookback - horizon - gap + 1, it seems my vector from the raw_to_df function is significantly larger (2776 vs 2851). Is that due to the forward and backward fill of the gaps in my dataset?

@jankrepl
Copy link
Owner

Ah, that make sense! The code is running now, thanks!
Another short question because it seems my problems are directly related to #30:
My n_sample size is not n_timesteps - lookback - horizon - gap + 1, it seems my vector from the raw_to_df function is significantly larger (2776 vs 2851). Is that due to the forward and backward fill of the gaps in my dataset?

Yes, exactly! The assert in the docs actually only holds for the data in the docs, not in general:) It is actually also related to this #71 (comment).

@AlexKnowsIt
Copy link
Author

Perfect! Thanks a lot for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants