From 35f8f188c8294db4ed45d94df48d3ebf63b1cbe3 Mon Sep 17 00:00:00 2001 From: Willem Pienaar Date: Sun, 24 May 2020 14:45:36 +0800 Subject: [PATCH] Update Feast documentation based on 0.4 and new functionality --- README.md | 11 +- docs/.gitbook/assets/architecture.png | Bin 0 -> 24680 bytes .../assets/basic-architecture-diagram (2).svg | 1 + .../assets/basic-architecture-diagram (3).svg | 1 + .../feast-docs-overview-diagram-2 (2).svg | 1 + .../feast-docs-overview-diagram-2 (3).svg | 1 + docs/README.md | 4 +- docs/SUMMARY.md | 51 +- docs/{.gitbook => }/assets/feast_logo.png | Bin docs/contributing/adding-a-new-store-1.md | 87 +++ docs/contributing/adding-a-new-store.md | 87 +++ docs/contributing/contributing.md | 519 +----------------- docs/contributing/release-process.md | 45 +- docs/contributing/style-guide.md | 22 +- docs/getting-help.md | 36 ++ docs/getting-started/README.md | 14 + .../connecting-to-feast-1/README.md | 21 + .../connecting-to-feast.md | 38 ++ .../connecting-to-feast-1/python-sdk.md | 20 + .../getting-started/deploying-feast/README.md | 16 + .../deploying-feast/docker-compose.md | 112 ++++ .../deploying-feast/kubernetes.md | 211 +++++++ docs/installation/gke.md | 2 +- docs/reference/api.md | 10 + docs/reference/api/README.md | 10 + docs/reference/configuration-reference.md | 157 ++++++ docs/roadmap.md | 31 ++ docs/user-guide/architecture.md | 41 ++ docs/user-guide/data-ingestion.md | 126 +++++ docs/user-guide/entities.md | 43 ++ docs/user-guide/feature-retrieval.md | 116 ++++ docs/user-guide/feature-sets.md | 54 ++ docs/user-guide/features.md | 38 ++ docs/user-guide/overview.md | 58 ++ docs/user-guide/sources.md | 32 ++ docs/user-guide/stores.md | 37 ++ docs/why-feast.md | 32 ++ 37 files changed, 1512 insertions(+), 573 deletions(-) create mode 100644 docs/.gitbook/assets/architecture.png create mode 100644 docs/.gitbook/assets/basic-architecture-diagram (2).svg create mode 100644 docs/.gitbook/assets/basic-architecture-diagram (3).svg create mode 100644 docs/.gitbook/assets/feast-docs-overview-diagram-2 (2).svg create mode 100644 docs/.gitbook/assets/feast-docs-overview-diagram-2 (3).svg rename docs/{.gitbook => }/assets/feast_logo.png (100%) create mode 100644 docs/contributing/adding-a-new-store-1.md create mode 100644 docs/contributing/adding-a-new-store.md create mode 100644 docs/getting-help.md create mode 100644 docs/getting-started/README.md create mode 100644 docs/getting-started/connecting-to-feast-1/README.md create mode 100644 docs/getting-started/connecting-to-feast-1/connecting-to-feast.md create mode 100644 docs/getting-started/connecting-to-feast-1/python-sdk.md create mode 100644 docs/getting-started/deploying-feast/README.md create mode 100644 docs/getting-started/deploying-feast/docker-compose.md create mode 100644 docs/getting-started/deploying-feast/kubernetes.md create mode 100644 docs/reference/api.md create mode 100644 docs/reference/api/README.md create mode 100644 docs/reference/configuration-reference.md create mode 100644 docs/roadmap.md create mode 100644 docs/user-guide/architecture.md create mode 100644 docs/user-guide/data-ingestion.md create mode 100644 docs/user-guide/entities.md create mode 100644 docs/user-guide/feature-retrieval.md create mode 100644 docs/user-guide/feature-sets.md create mode 100644 docs/user-guide/features.md create mode 100644 docs/user-guide/overview.md create mode 100644 docs/user-guide/sources.md create mode 100644 docs/user-guide/stores.md create mode 100644 docs/why-feast.md diff --git a/README.md b/README.md index 533c123874..2c2970c6d9 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,16 @@

- - - + + +


[![Unit Tests](https://github.com/feast-dev/feast/workflows/unit%20tests/badge.svg?branch=master)](https://github.com/feast-dev/feast/actions?query=workflow%3A%22unit+tests%22+branch%3Amaster) [![Docker Compose Tests](https://github.com/feast-dev/feast/workflows/docker%20compose%20tests/badge.svg?branch=master)](https://github.com/feast-dev/feast/actions?query=workflow%3A%22docker+compose+tests%22+branch%3Amaster) [![Code Standards](https://github.com/feast-dev/feast/workflows/code%20standards/badge.svg?branch=master)](https://github.com/feast-dev/feast/actions?query=workflow%3A%22code+standards%22+branch%3Amaster) -[![Docs latest](https://img.shields.io/badge/Docs-latest-blue.svg)](https://docs.feast.dev/) +[![Docs Latest](https://img.shields.io/badge/docs-latest-blue.svg)](https://docs.feast.dev/) [![GitHub Release](https://img.shields.io/github/release/feast-dev/feast.svg?style=flat)](https://github.com/feast-dev/feast/releases) - ## Overview Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data. @@ -24,7 +23,7 @@ Feast aims to: ![](docs/.gitbook/assets/feast-docs-overview-diagram-2.svg) -TL;DR: Feast decouples feature engineering from feature usage. Features that are added to Feast become available immediately for training and serving. Models can retrieve the same features used in training from a low latency online store in production. +Feast decouples feature engineering from feature usage, allowing independent development of features and consumption of features. Features that are added to Feast become available immediately for training and serving. Models can retrieve the same features used in training from a low latency online store in production. This means that new ML projects start with a process of feature selection from a catalog instead of having to do feature engineering from scratch. ``` diff --git a/docs/.gitbook/assets/architecture.png b/docs/.gitbook/assets/architecture.png new file mode 100644 index 0000000000000000000000000000000000000000..bc655b60f327b7d7d508885666064e0c713522b5 GIT binary patch literal 24680 zcmc$`WmH^Uw>3xv2@o_`u;A|Q8r%ZG-66QU2Y08?5G(}u6a^ISg%$4Z?oL;p_qq4G zJx2HFAN`H){ijZ?QRi%5Ywo$`sYn$i>G#MVk>TLr-pk5Js=~p&v4DenMUR94v-pom zw8K8$x`@kuMS>MyB(n(EcLLYXTCQpi7Ooz~&gO8I_6~ODOfII*=H~V;Rt~NwZ@|KE zaG&60CB?paW*;sEeEn*fd-C8B^!fANS611%fqZ@@4W7shNrt@;SLx2(iq#y1;*xP| z^vrTGi)JzxUBP5M+a2b9{<0yT-Y~)wEFWs;ZK7(LpUE7-HXpNry7|0!$= zG2^cy{wa%lDH)OfT;VH30{&y``2XItO+egkhUybv7Z4ysTylRFIy{?{UB09uTIrHM zo!i+nvMPSM_H`5upUc_p0bp{kVLb9UEplbdI? zbZ2TNZhh|xlbt5vc9))8@PU!x=-bieJ;KLEwXe|k16gw*j_}V?h?LIZ8BY`zA$@Z| zKfKr+zqjkyvPv)4|^zeS%*$dd2KRfESTNo1w%^; z5pQd}#BC)v(Mt2cIP9fFuzTJQ)Am^_$z#eoVK%};+s#dF)~Z`!pWp5^E}A$jA|U+J z@rha4_FXu0C48&W{2Gz(d|n)>Q|gDMzb#JCbIF6q_-}Y7Y+M7$$<<8oN5?mA1A9i> z6s9V34u8*g-J}DIesyR8z7_r4yER_N2K~ zzPBXNAyW6zImNo*ZnC?V(3@uRTH~= zG)79F1_1KL<2p&}vG-g34e!_amK7u53;lhVOgX%8D~G9lv_<+PULm{=aXoqD_^m{f z)I2YBat39jFYj&bRX7>_UFv_tyVQnp7CEH#WQ#dk5uop>4W&06{b@ChwxU{mpB{Z5?*h~S3I#%5pk$G%Z%fnLq;E1yHiX&PLW%hYJbGJVK z760!W6~Fo{vJ3L;GORtjeKXoCc!Q<3C|-o!U&}+!8zlrtdgI4z5j#^tUrB`&U$kXp z^m_M$nFMzZO-JkObw*yhs$E$Up0=FQFKAkOFWAvxnBYg4l9eG3Y3-sIH@5cVS~B#O zz9Cx{)=^aSLu{2GTgPKEbrZuJ<->bLe7JZ@!M~wd%|9m{-luRp1nO@vEJGXA;~H}c zwl_LcQ^w?X3O>drM%~oo&tYp{c<=Lz|1jSH2@AL6{HT-MnIulHAHkvV<%4Hdn29bR=nL&Rfldv}M`IovY)33*z-gH^#Lz(I?Ls|QJBu>g&>MvMn5RJY#lRHT))j!TE zUX>dENY`8vm|K}ckCdfkLLKw_iF~t+7P}15d2@*d$4KckUKD%Vwb|@ZV`GWQz(Otc zC61t>u+EC(fo3fe$eAt3P5)BjnB#)@n6pYm=!adoVsFz{Oj|UZQ5v@eE?UH z1Sm77KS2p4xGbMG1r*gF@t!rm5=f`P#CyYN&oNy&5(%dtSPtiUM)Z9=*Gk{w^LMP} zEw!J5?6}Ndj<~K*yyEI&nJ&YJ85ukQLINIV&TGiSx4VMNL)9g@1%711L&&rY53fXM z7dpQYqPmgv6f9pc;*fuM&5G^Nw%lluW>>jMYOpYRt^P`9Ux1)fYCGEA|1^YsFjqf} zZ?bPZ%^*Ua8g%;5a@s|DK9Pn;$KYr}E!k8;c+q7;r$tCG9;-Z3{Hag2Wj{`@M{4_8 z=+O1#C!+u+uYAA_+?c$)JSmxN`EpyhzJN!KzOFBV<9B7yPL z9+-;hwmA4i!Jh{xLv?=5t3yau=AzZ?AYB`H>|+a6%@mpu6vD#nCaKl5s;|N%x>yej z0^qa~hJVX+N%{H;T0~58IPmz98STCWMJekw_62po4U}n^*jTS^#-q^M>hu;2Y1G~? z*m!+KnOi1XudbpuvD`;kKDI5asR_kiDzRR?qp)Q;q1+ifSa%FC)9476*1mHr6lQki zmtCh^YF$FDW~|NOOO!5vPE~3mPW^x=nnLlhvr7e06ZR?dYa)-+c3pdGnv)VFYl%Oh z?L<#aI=os(u;TTtdn%F4)oI8m2nUtUsfx; zt{&x0LFe*B&*zCWgqy>HrDqcS>SLJHOZb-7$*PTuO`}|!u~m80{!IH1upgkvOnTSW zP5$f$p1o&hWkr=WwX8@FOE>JnmLs>6zvdDKXAI-zg3OvJ5~?$dM-3L$X^;CEBm{yC~? zO7zV;a#8+7>bhZ!EkQ!5^DSx4*I?HQ8zZ9V+=YRm0cmSnzM!4ScORAWR(B&IEiD}3 z;Z#UjJX9&F^c_2o*%6RosyDUU*X2g#7D?e@?jbU2CYeEE72(J zn1mr3J57k+3;ChGiY()DA>2-|#W5jH^G0Qfx4*VCeO6l?f>MgIS{|O4)7Em9W#p*2T@SY; zjqc+4QbyFqD?P8!;uU;N_qLlV=|hD>?k2wux8Q#(Q!V!qbzac=fx>`3|L_IXM4@CQP#(%laOkz*@M(x2!=Yumm%P3zQOYqMu9 za^{qq$cs@#o^_#KKJX5`*0Am0GX8P$TX{}9CrfUlQc3Cg79Ke(NsD(@zKJDn3gt$% zdNvH;f4}c>-gWGGube~uHlv_a%ePZ}Jt%*}$BQfhl7{!y!>l$*tJRw=_Tjl)8y_r2!D^9!Cqhq-Dto=9WbREr@+1x#UK+vumtlP_p3<7u)vN%_i64 zay~r%N^VQrK;7I-0sw}`GwrXzm#=d_lfkBg5A-SZy{r@O1`JGXeqY|GuXi0=03lfV zvXGD_B*-K&1sz8&t3j!n0JBF6O)oSBA%BWP-tiAGE_Pn_9J*~+DrTM?^k-pG2`BXPhN}Rvj!eWFYNoAx zQI@48@Y)S&UyUndW9-Q%%miEe3bMthm(stq%)>kN!WJ|eu5|@Lo&I&aSo7#WvSql8 zPcA(7InG6R2@;$AI6FAeuwr&6&btB~Rt`9Gx_o-5VRkzoGh$it^YdK=XZ5AioQl0* zQ2OUAsvCB+Ia5W4w_e`7_o~x@P9Q=C*@kk8^)-Mc}`JYhy&w4@GAO|`d zFeo*4sG^&l^x;ic_Vwjy$C%ggLuRx(I)vwbhC+>2c<-QG$2Ul91@ZFw`b_Lo*h5>= zU>MU&qR8{0FMbJHxqfm-?e|yyD^Pid6%4^!UYi#a!WF+xoW5k&rLaoxIdf?OXS5@_cDJ!?#+X~MCN@jnXWxw5gAauK)iJK zxX22EduKSX3QQuOW{d)bK10RRiv8g)5QSRKqLq}c;Uf&2qk><58z_JpDxtyo_o18o zSEq3i5&cIp|4jo}+&SOD9tDXm8LsNw{(M89KGcA4;@XnN-{{xKlfZOhgaejF;R@e>8SI8}9kLu5M z=l)Pos30k6k(xM;KN4s4C}AIi#eoTcy0WOktOY0qQqrGXVqvC#N(dbrYy@G2E#v}Y|Sylk#~4^m$smw;Ct8FUW|NYhZ#F2Hnz)L9=Ws2NMq_*$jkkZ)3R5LVgM+) z{+0k;R#H;(STtE_SV~&@U8p!7HanGNFDpPHLxI}F)D-Br_S(tGsk^s#WN*{9vM)e@ zfPjFF|Lq2&R`)t0p22wtl@>i+!bMdH0nW@6&H26(^hb0QQL9$urx2&?BK@rYt%WaQd0 z=!BaeHS+x7DJCxNBXJnnw$Qnd!Q#DJL01Cbu=flt`mzlhKk6$45tQ?wm0enUeG_ri z>09d-3CiT_j#WN{(UI6^2R%I^6kJADepIp~iMyv{hZySAjL9O$qd6SiRE8UY%h~%Ho|;_;ZtXv0hc8!8v_A& zh}ro!yDKUymnbcy?@ilYFmrE{!J8t5ewcw{C(nQx|MU2kaSl7B>6RL2n>}B$ZGWM@?5D1vLlow`Os1f-P|56cPyfX9>x`G-4wnI!s zH0T?`A~C$e_XxPRQEFM=p7m)zC7f)1;8(`tGml*y1``ZGE1o=goC1C!`naAY? zP2Cu*(Gii6Uz{n};ide6Axr`Hci+PMz(`i_svZ3XQVXOb# zHfex2!U2UmE9nd2^ICg60WV?m?Xrv&zOC|kHwEIZ&q1E&-OD3jqTElfjpe9gg%>2WPbf zwhv3|^5y_HjR8A}0A5O%?*K^DCxu7!OOohy;z8O= z41|47*;Gjal0JkJ1Xy9pz9XHO;ecaYKO1&PI!~oq^w?9Gg{{3OkVkEec(sUuW-RkFhT4i{|k*;^^cFk(~r|-N7hQJ33apa|< z_jGBqO%iI}g0;7-6V!q(Q-9GK|}ouz6etXYsnmr)@t z#x%Z$l!FY>PogF!KmYD;pOaReowhz(TC45F2!t1NT;3ZpdGL9{_!p$p%9}T@T^@ag z;noQ@^Sgm(OWI8!o)s^=?Uh;0tnMu7>R!lWY?HF^-FGVav|&%6bAt-4fP(VKl_T8jKc&?-6^KiF;iCKQvCL037GR}YiYO}zi-Z# zP7@U0?`b~B3Y&Q3Og-rZ19P{dh{~ZV!4dehvt9W~i_tUfi%AlfZyoH?+va`s>_?fx zw{2KxR&ZK{Epb$rKU!}D>%X6DR8sfBD_lA%HZ`ZF+h{wXwA^Ep8xmSvuln4}HqO`P zC6*ZlrL)h>tlM&+HS_YOOP}7BS#jbkEiFwX&7N-7}P)a7^-|F?c&qr#QMZ_T3{rg!4nO>Z12nd`X*M;SP?oiy>f&#|M>|tS?3;U z^k*2EUgA{fYpoKP;~F`G)Fl8nkObJ7)G&U;7dMmGrn#fl8!QtxnSC9bo0GM)q=DeL zf0SRWr+3Noc3R~dYcG0Pcv5t$N8HGW;_wxMjtj5T8m+XwCQ`#*ug@@7BUqqw+xkLi zhyNIUF3Vy~C6kaV(X-jC_h&+X6O2Xh{(#;iQC&;iUxUAN2ufFe6qKuj9Bs3^_dEYj z_4RrA>`9-4Y&R~kclB1vml`huu+5 zdMnIRj^Z6r6*fB$xD+L8E^&0+Ay2scqErpn+cU$FF6mR4x#?Oek738mBJdngz|E4=vi@jAaM$AeD{XzTW&_lch=aLtD{^jHGyk(- zO@qBrqxuTa4}Q$ukG2!naAMLIV+M0`wTNp>&*rRrs-W8Yz(!JY3C+OY$*u2PmE8x#!z}dW9uBs_fHkiQ?{I@y*oc7HZ#u zT@I_;Z+C{I?F{I(y?})!>8H?n!3p_=0)dNF_bt_<_Y*4XKkHLvi2Ip}Ia#7FC%@|T zn9h)pRvfqVk2<5!4J#H^TW8<4C{Et7&((0jE9)P8rzqCg;h$g>a^36X-eQbHod|a0 z)Uze>nPZ1UMj$YevK&8aB=07KeqZVr+x%8;g7~P#-}$4 zTy5-N>w^Y{s(O4Fjy5mC_lJd;V3PTU!qrpe-w5q?nL_Kgyb$asO_oB=Wv!>z13e5D1J7$$Cmj=UKb) z<^>i;_(33?q;pZ`C-#}WVb&q;+$T206~N#W4Iac}=7Kh0GKFrw#Yy-3;iFxV=}cIF z4)G9fd2ZAM23^3@*;Y7g7zNoqUVKK+<6Ir1r zebM;*B`W7C_3E&{e!YU(xCtB)VrN%Rych#+`Mw{GS5UKIi2AlhP#=>o4C3#e?vc=1 z1!!!f&(3X4=d2}X*Iqh_KTj8VcYdFRhm;LIHt9`Hss+YyDzjEU z+y~_aJS_}4Aa>HUb5YTkJFaOvB5V6%J40^$3B!`o9F^Yos`H|Z}jrcjn zc7nZ5_H5E`8QdJaq9zSm5gYuV_$L!3OnOxyz-E06+yS_NYAUf>(fH(~HoqAkC|pL^ ze*~24@->F$g-xh>Pk*++BBrZTRcpy&{!!BW{i6+mH2(r`?FK>lc--W7Hya`zsIdR? z@n&3}vYvY&&z!Ilhr~=qe{oJ)In)jU;TRUDnhIIlD%7o3*p-y^NgGcG^>cT8 zvNnK2xlSGFU*Zmb<8ul*czk`d`IS3)94aTFPh6zEHQRJl@a-uzBop^go>-Dv)021l zxCfdXJ@g*q?SkCn^_8HhlUk-Bhc|2Re`x_EFhqpJP#F5WPp^v-K}xuY3`Fkb)@Y1@ zkmGi&%MHZilmTn-1Cswme&ov;?jjy~rMdC55#20JL)4ksD4vm_SQHMmprKjPnJdF_ zr?2~)HWCqc5hL^ck@OxhxlMP9#(Gp+d!9SQV2sQ43mPdb!RYrT9L)jYZ-&@J#B6!< zYQEHgzB@Z`kA6`9VqDj5-Y}>9a9}VI#ZFN(>pYvGF|zC~0)H(ZPud|o>-chW8L9?z-*{^uAhhy0p9Pti@8M5Mt@fJ!^-b zrIpz&&L~-B*IzIMrOIX`d~%~(;W|FymZc9>DWb90F{N-`sjS*|Z}b-r4{q@!_1ZZ! z@9xSP>(}Gi$!oJIJy?*LLxnRrKvs6Y3Mywm16>di8JNKaJ_`$ErqcC(PSyVU$e?&H zBZ~BTS^xQXuiZU5G6sHjEL8r}yQka7>1YmeM?uBRs_Y3>k-1T`TGPCZb(aR*viH#| zeOk$WAbcR4a6m@p3nn=c+SzTo>8h|m&v)@mbZ0?S`*NogmrUVI0A+`?& zABDuqk1Dk4%Bs7}M?==MWHaBlcc|(ntlW3ZOoeWlm46Y7?l$ZxE;AX8kF;$B5+6CPg=8Gnvi3hMHX ziChSNHwYfT9`7*sBirG<>o!k4DqiT@<9cX&>;ouzP6n4-Ck799WJH7^t6BSdn<=binaf>3W#K6;zwV zx*Z*US=AdINA#pST^qxlIh|{@BOV)PIczR7`WF$gp?kL<>)PTK7~5X9PtaCy@TfgU zx_9C6Z&9DmRuqV0p7WM7aJNgT*c2|AEY!T(PDy``Jy9Q^9aBIN6nlnBLHk{59&OEo z(R>M7Jsreoj6zijy1qw8YRLPgqnBgL^ZmyyOT6xrY8LI92#qa5@yes4k2h;Ye4UKi zoEtrZ#D#Y=c)UEm7kMwcb*yaENcs_!)(+S46FY=k;D+xRq?Y6&rX=wzD#X`N=)BSx zKvr(vJ6BnE^j_lbk2_CI8!b{C`Slc@OjY40=ztSAQ5c zxIOUh1UWBf`(@?H>OO@IE6)9?ll!$dXm&xB1s~GBG~+W*i6q8AAP}blaN27W{`Ss) zxqF_y)XIHgYF^GppG}*KuwZ6pSC}!4(7SR=0vs7|l)G&F>$!t962+md^)ZKKSl@z| zG??`?dijf>U)T4p(^YZomo-9*Hg0{x3%)fz;r^s|qccUeW)=&j*0!tu@u`RO0?BPg z1}Bp4Q(tB_s50`D-iVIHk1hl{JW<0OwORner9WppkMN%wejGf3s4afk>)7i$<`}7G zA_92qQ?NfG`22Oo9~P2lXj7N3u^jV;Sa72Pck$tts2amGH6bB7|EkLA`UC6uUcC`; zKQ?kc*l1JGaJzRwdLCV`E!yckasa6GzzeaM%i314VLr-u4j_;Jw2-vZuozRfYSr#IELNN^ub#on&q z&1)R@|3No#cFyJ0Zn^`gF9in>p@mysPPx(#kFEgDyD)&mB9EgO;9cT1$a9v@({TZc z=8;Rep8yZhb4x`QC(5!E4;83S0|RaS;toc5X%Q)JT#^_$T#E15RBWrv@E)ZKCEI?Z zb#SXb9<+|ZmRs9PVc!2a>Uh_zWRNoqFEp9*hi|iW_I4db2e|MWH<~Il@pBa<7V-4< zRSVJYgU-9*RFwExayv~({;umWGnzWjT36H(NjDx-*NlpSy*1u-%#|Rx07+W2Z^sTi z6W(LNQ(NVl%6dBOQ_%AC3;_#2x~81s2*gIZ|5_nM<-lrv$I3)4*oU1{Ri@1`W9&d* z(whX~nsxxjEbURw>fn`fbTBJfs!;#&Y6663wmhLEqX^eU-oCkDH1Vf(VtGyiK&GjZGSi4Xs}Q7o&Axwaw`X|3rauej$>~) zh=?*+7tU4iYU-R3MOPN@U#wqsb=!4#7XR=gqRIQyCR-m59UjB1lMgR=MU-_@Q`Igu z*sM^l=sz7Bi_hEi1ql2;vFN&pi*g-f_f@CL$#N{QgCors`Qo^EfAs<{eXU{ZtI8Tv zK4lp~07i9jbxrMSn$@9^A1#xgBe(j^J>GWo-bWExFGD3u2=pq^^j;;hIuJ{C+1QvZq*5* zLX4JPPFMIDAGjz$QCb6a=N%6Xj8$?w`N@Cm* z{HJtbjj|;_ywN;t)30l|>*aWxpV0x#Blo!r=IZd_v2Zgy`Ki)MNaD8IY8Ie7_Ga<8 z?1R5go8<=-Z!q67#u3rev)5?lm!DHo6){hQXq3mfrKrcvR8%Je>?}_PY^ej{dwcI| zD_RUP3)=67l=}iGzFG-sa;O)A7p{56kH{U0U`hM%0@nQxiz7SD_zQ z7|S9~sT!2K;7n)7bMb-;%?4qP&i+x^QpvwL-Gu>%AoK%BwPU-Rq+ZXT0IgAN#*FG= zmCBSXjdR*OWz~K+QttxwygOZ7Fwxhv32_}lp|w%THR!$pxH(1g1+kysc-63e45J}I zuA_#ZATna>qRER{KJ}+D8JLB|U7VNRVFsx!yYed94amChga=~BcHys|j1xP|zl`*a z*|fDaY`GbpcdP6k8;urK9a{W|ESV3~@4DR-fBI!x6nAuV985?3PeKb}xGKphhu@7p zGh->F$E1Uk*oDC1%k_(mi2~zHCDqv2j@3$g5V+}pG1i~3>G zM-R_Mw5&EKmpoGBOg{(ENfeGX?Y!eA1#aryrhYBEZa6FQ@?|{zkpWapwq^N(qguhNh zZN%YH)mx5ok?!SXyR~vAZEtMY4}+b#)*}n?+Y){8E{gr~|tOT20 zpwVxOs&Z##n69;77#{;1Tq&FALKdW{EZ8ejQX>10Rx1HHYXTMJ)}w0bFD>K7a7IIm zVxbi_!8HCMv)mfBU0wneJ-5J`y(o_l(IHnjh}sUWL@pTv-;-*s&0HUT!IEr}lBgyf z$1Tns32g4vzVecr+?*(ye|c0`0oKv}4_5IX6XySh^wR8mDhoEY!3_)Sj{qH=Ypl}` zBt|{S+yxRqb30-O$9mw@F6-NBp<9f2!TZ2*V}(JW%g!t~06cEISQ~V4Q@vZM9E-M* z2CMKC;^6Nxv7pS)#*qbK)T&;s&0}E8@-me@pS_ESj?RKcYQwS?e0+Scj0(B)P!g$E z%%(I62R|w>RuFKFC>8$6C1J_F&%TL}xR;3!`#ZRt>&)Zeq?{&FXK-r;`uf`SaMe;;Hv{RcalUFF{daNTh0nUpl^ z#eX&7f9S{S=?fwj6SsRZweU&hd3`M_ZaE3{@k2Y>iSI>&A|hl`U1FAL zfH({;4}OI0*E=HE0W%z$sv>zT6U1Q|gC_QqIVJ|5I2bl2YyTIu{4Y~APA@Tb4K zo$T-U9e?s4eRWN80n2;B8mlJ;d&HuyKQxB5mC?WBHA`-VsMy9lr6G`ReSJk@3 z`g;loFG9I~IYp|aRCy<3+`XbBxLcvA4;DbjqLO5FJ$X%xc6kYFX6E$h%S%1Ek1e+! z<@#3^LaR!@w`f`zc`Cu?o?7wWdUytvUuKYW{hf2|8g}k>4C7b@O5hlg&+Fa)pMPg<}hVez%S6T9IXQUAe=CkY*;7cApw2L^4o2X? zNHLk}%2$R;@DEX{)#e(Gr(PvBeYCJKOb7gvpNH;kaH!1ii9dm36FjRTr|2$jL(FTW z!rkoBKIFRVymFkiyY&;>-TTp(tEe3XU!71piRSPc#v)yLTaCKZ;C;gHHn|eth8Erp z(lK2nKPgZPT=P}G_HKrGS2Op&S^lv5Q#IaIE~g()7B{R|etyiT89xJfKEDtn+>5-c zd-042utl#izYz4L>I}$b;5B zR?Nw-7;0eD6#59-l36!`thQ{F2g`e*EYD@k!h=Gdq+fiwy?acS4iDb$h1zEg8qjYG zl2Xd=LWvJ%n@oj91(;m_#Es)DN@n)F?)Wh-?E>fnza?xl&YfmQ^9a*2n>zHvOax4Q z1~O#Lr?k3iSQt3FjP>O|x;EZ5o!#^D;L=Qd{gd762L8$J@_6Id;I;YWP6#n~acORY z?KQR(zUgjLL*$D-<;BIF@L;xVLZBd<&&|e{On)2#rs!C>a79cr-PEVYdJ&E=}p#Ndv4e4l zH;955JFJ}5+S325)9ula@Z)&0A-;H;^x@DiSOV}@cH`h3m8-Js=o8*X;(U_8PyPq| zzxh3kx|G#qak=CvG&yzWxv#k#Kqh%~w!Z~Ci_4;PM%x;%pw1s)D^XLPtZS|-2!WI@ z%NFjEa}rZwsNLzT!xBdG_Z-?^Vb;&~&*}`s5ecl8w6-K9NeF2jDjoVqY`Tkiqou0U zHlYMy91=48YI%>_kclFMiJ2Kdrn%i@{s^MPVL_nMZ z^WSb2dg(#n%Ao!xD{o#EOYhyrS$9GJPB;bWu-M&Xle}NQ!7~av}l{(VuNZWv)hrSoQHWOoReh3XY34 zQyUJdeBF4DKDMvbC$^9^3AbAE=VCewLQwYFI8zwH)ex<5qjL5QT z=kPDXY4u8MqNSH;;p^XB^tV7Mk$APhBXS1%4Dx;?s?({)8=Hl~b_A9}YeuXQLM7+F zKMVUds9{z)*_U_;|8omF%A7Ozx)puLk*uj{*j188`M-zhbN#F~YBbM7!DNe`V^=I& zho3Y^t38B%)R4q&%Gs|H$>{C>Fapm6!auBkVE#v612C9Y;O1YtoSjZ|@A;Hrss^vS z!brxm^nScYgnRDQgNAz{j%tP*Y8E7n$|{oad2LZ@Ya2_>1?r^BILp!a#hSQen6HZ^jT40k_2KT1kUe2unBm4Et8{=Y#3RZaM! zp#cL2?SQ(SJG+&WzqsahwC~iZU2?(?F9vNZ?N$WE2*@+A%L?=2s6{Y9c`U=02HNx7 z;E@V_&ViKxuUpsa8tw1zSxcHG_!NwG(Qc4t1?d*KeI@|R7`_mCO!ygZI zDCM5)kX;qn!!~jE2>zcXO^NSv_*{B-F)X@6QF<$#%X>s%x5C`6_|wx9k%+&q)X*cu z(r~@ZWAt!lzFeo_o74~}>0ezGVC;NrNW0DVE++Nn8RJpEArg;UNcNDJ6MTZk|Mf2_ z!UP<}Jr2GN-0?s_WbgVnuJ)IoRh5ttg_)Mb#+h0CtF}tiUULOUa)`0M5(((*K8+a1 zgb@83%Gntn4;5JfaM;}$tKAL?QsJDY{}rLvq?jd(*$nVw9|bsHznt5cnr6!WHyR=a zhmOI2BJz7x(#q;O-T%uX?%#H^WqXW$cN3mHL2*fkPr?laS_-=cI9e)LPTu_cj@%k| zVc1*(@`*1JRaHlPUP!crr3c)91E#Y%uj5}+_SD!T2 zf*pYLR+^8Hh>Jf(xA7Bm7(A{%sbvVqPFFTMB$nTt9Z;>s_;=Od2Cxl#Oh^F_Dz#wv zoC+pBLq$hd_I@Cg8-{@_t!kr=_lQ)`jk;0cE?e%uD4GqR07K0yyR4M^i$E8-q|f$A zUYvoD<}nzi|AQgerF{(BSGwFHuy-vEg#HSh<~T^-7_#!m$igIQFhR#+C67EwM9X$_ zVqzk!68qih<=x%IKWdI8e`Q_cWQ(;6YxwpwJ0m$9|MUpsc6b+qlHT+A+G=QIg}Tp4 zfY*oH$Q3Dh`{p4QP0T)NTkjuG81X-F>3=Pc|5v8|^7VqLoNxxAc+KXRIG9ifrWG7y zlpk)U^$_YjpLCsoLS>ale^$?jykzE`P3SC$YMy?~s!vbI4A{+lc=sP+2;LCHgJ-RE zS~$o^^G!oVn~ejdnpx()RCQ42R!_asA2ETVN*O@uQ2J^AxfSsMeVh~_%NgLdyA0@xRy&VBZU0UpmA2m02hYVq7Hy@0ZsOuH*zGbbty22C^fU}jA5W-6 z&ZoDf4>p}iw9k=B_x1`)K(s9yMWFKGgDLRAj4=hIwi~rwY`lYeetzCcK=rN9koBK! zmeHY_R=1gBU*#k!Q|ZFRMn7A$W9pTqkp_1d`D=8-cEZ(|P--r@Bx5S=QaJ2)kiJUG zkIxi9IHEguBikWc#^6PrBTSY6I;VBG*|4D{ z3vj?|E5tbOP5(S58u<-?tFaj&!Bw%6&`Anzljj z(w|g6CKuv~FTZ3Q6vEDaxyDj;iKc7)ee7jT3yGa9{pVOQrY4^cij1Eh8h}(4qn!E} zZrxv%CC*Rg^9SB;j*wx8V+6@P7FfU`7Z?G1G1vy?Cl3AZnWa&XEw6<8_OFlRFo|U zRS9WIcXMxT&t}@9V$bw2(r-O z6_6RQWk?33^|{N}KpVNf4JE}_N{?N(Sy3Lm#!3$hDO@R$d!6Z`^s;wx;*$GvA-Ka? zLunMtW)eEP7VW}|HtQ5e>1%rvgtH*+=XK5dRcA_$7ru#@qgob~(;{B8Lv??-h5Mns zmw!T0-z(v#X0RcVfa;g|lDM3Xb15{1EBBrS`O-Naw@(e>zgWJFj0nqowyvTpspkYe zijspKuV#bA1_{wO3ni`$uEznWCBmb;#JzaEnRN6OKZ`kKAZ^{KKu#$iFA51D(Y!UsG|<x@PDOazu8`sF!%N!6ET^5zAF`K2vi<09RxK5N9ydh2Q8iV} zGGD$VoHu)a3+5uy`ia?Su9ZPGtA;m>rkw6!a$--miIBt=Cj=FBpcFUz^UhS6$9QP; z;EDLVpnv8)=+|W-zLgz~dKU52Zk6KrGQmi$BO{NwP*+UcHAchAk!j&PdE&o2l5pTrJtHRi)8B*_s=72RqrZO0AE%}?U} zcTM_k^jpq6PDe?_EyQ+yd>(wi&C@di7bAhA_B9Doa+GUrvvW>TpcCsT> z{)}Ewpq~OaTisfe*FTPAm-}j7x^V)BV9P3iYe4AX3w;!$!u|znz++e09@SJuH8wTDl#wcDYSbRnn~y z&*u*W^4x2G?E22uB!I^BnVCHC3mP|&;4s>F!PK>pB)^utA~YCt&@NfX2#7Lkw39VEbOWL!`4nzt6j zNO_mGawBJz-MqZ-V_V_0yI4TbFaMAWYB(t#SnCebxCSlY3-b>;F1Q~%FDWSDFfOS< z(qtu0SgW!`=@ncQIUeND&NH>e0pm(GOa8WuSdAUm3uq>6Xnjv#Patv1-tL#IM6W%3 z-wq$W6`VVAxSq|<=!y3A-!G$42`*{PX%;3tq@=UK=61ZWfnF!l2%qoj0Ib(E(z-e%pR(bZ^i5 zj)1z@8jeH(8giFF$`WY3-Vn#K#rAo#?RxJHHQ*v1`|(Od>2$Hc>#>+6Pcv_%#BqGI z7~%PdxdE|3PV8;8)}!hgZ2-aRdW3Z0nI9 zuD1N<6c@OI%&z%b794opXZva**sw0XEG6sD5dWewt7lL9-j15+9sOo(Ug<(&>T3$; zIO(})d5#r(ua+X6PjN2Pk*2Ojs9lfwb{8u9yib#S`O?L!n>$?K>GFz2C|5dnC@8Q+ z_YC-I@wOOJq`O#QXYP6WporRTnu$+=%RPs=ysyGtBG^^|wFtr?HXGIy%OYG%#ld$A z>DKi%V*!u2PhP*mw;vWo(FZA1lV*b~6csx2O;2Y;+|D);ZjYEwPk1b~K+bCmjdmd_ z6?sB_#^qQdYaNtSO#Ztg&%%7%2vo#x7vZP{e*3w`cend||D(+|WzTkMciryvNtpfV zA)u1Oz=1ao^4m^0>w3wK&lXc4iM(1M1tJd(-y;E-{EuJrabBMN~PzMc^Zp?v7sJ%`mDOpDx?$1m8&7D>R>|fj${X#C+ z^8XCLc?637&Uy;RzWwbdn)D_%N9+Hn?Ye`S`noMh6KOUGMTk-&Dq;vd5l{pXg#dyA z(u?%aLJL(zKxvU8y{I(l2neBr5Ks^hg9?NWp$0+`LQP)q=Qs1dnQz|An>X*y+<#8) zJvV2ceac#U?R&4XEOKBzFfWO#`0V?*;5$u*Z{@DcJ=8n+T*_unb%Segi+t-`Yh$z- z5RZ6Nyda71X?wz9$I!%L(KFz&+83PkbskOK?tZ7zj}$P&%+=vlT2idk>4#?Ay^p!K z^(%jMCt6gY9<5%poUa+hLglM^@^+T9LXv%c&T>S>7F2cRb18>4sXS{^@mv?^SN)N6Ju6is*H~`IlF^B^ znq=i5@#T$~b}}&E?wvKh^_KL?4VyGGSo^+;GWD7LL&WCdw9#CWuT@f{r^0HR3#4M^ zl1qCGz0rqL)|c|*`&z+7m-tEhhVq14183j9iZH;W*2h8iisyG4ed65%PJsj zNetOqS|;$qRIQ6zI>FI}L`I3IM1&>p6K;b0Xm}*P>xZY~ykS|zrgK$+ruTLe z*K~*gIG3Qs>FXOO$7eyP>n~#f(_NLm*mls!Bkbh1x&LL0c9JrDF-xJ)47-z%N#w)d z)M(YdT#tTl4}kG3!Zq^kS|w;GJY!;GIP$==+HCh8Wo zyp56rVtPURoU-~t#Eu!ZN!hjA0uxa2TxWW&)07iia4y=Qpb*VBi) zP7oo+>%uac@)qQ8a@U>VX*dtFK(Y7;sSN;Lx3ZzPdvmO+A@2LOkT3$~P*hUzQRc^DE&`YjeYdW9oM zbWnOQA)M(~sU$q6FG~qlP-4^)d}xgoMtW_`Y6_W;cDuAS@Bfm_8ls!kS4*W# zW20Qsewp4jW(S5GNEMSlbQ3eoJ`-L(9_p%Q4Bl`}iwn`*%v=(~M65Hc=1B8V7H!j3 z`UzDYW9%IBJuTLoRXP#~OGqhKLIGys(fE8<$j;+5sxPG1t?^a=6`ffNH-iZW4tta} zt}~CzHO_clDt??!Sw5F}={|6hujkY-bdlg>0SY@aeuXeH++hgX*f$LEWd5rtW5dA7k!K_8Ta<9$xC!4+UdaMlIx^}!w}Rjk8vm%^tRUAmpA0drkDr=;%^s=&x?}%_m=`Ey zUaYFEW_#)V(UVcFhtm@v6gffjvuy|kKg1{rAP$w;J$v!_{(!-4nudVI6kE(ROQ0_! zws14)ZhO+B_$AJqGKTzdEg|=GdBTUHlaN&6n_)lx$e>+#gqZNg4LvJbWM|_EvhEFS z*L~&06LuHVl_D%r~ z9wC(zF!O+BtT=UoM}tw*Cv09ZUyVw!W3(^?w!6DqV%G3;5o{-x2}jbRB?av%arGUv zN*O=LrtyVMn&KeE9KWd>kF)1@WX7}}Jb;_&`^4XuXFtcK2UBJfbXGA3H0ce`S)Av)zjGGPn3jlz zd|FBh#&NfyR9aE7ZW5Ek_NVjiwF(u@U;$CbnM3udcO5xRkmGRSMzp(d?FaRAi>Xvg z*awsBWP>s*wr{uf6EsbrHGjBYS+F7&!v%e(ekFU zm-2;0{t4#_aKe!|<2_+nhC;khKpu^os(A74-T4Q-45BWr{O?v@-(F2|s?2>@ZXWdO zG0pP3%M%Tzl0v~dQ!vREpOq1&u8*dFhM2MNZA}(h5JR6T>=iZ_Ts+@lDIeiE9xrDU zcLE1!`fxf@S|P`Mz73HiAmW{jodbROoOP4iPH{the5(?_nqL;gF3=hnGd}(c zoXP24V;%Z@J)ZxZFba1~i+>Ldf%+%rXzUk$tans?m0=|^ox8RCYjem3-MbTz+@r3p zj@3p75w2LY5j=Y_2R+wXGH|?^xku)ml?>ip7Y*5~HB{USPFsK3a82c6p?Ib@Au);h zQmK=jI4cc9IOG>qCw~8ao)E(zTJf^Ra6s8dd0*D)jN7$5y9on^L7{ie?^DGuaSoo< zGuKxK4{4pPu-<=v?OL7;Z*JpyYc&(PrK4-e^7r27WTl@PL;9+%L8Y|x8yc%=b0XZ>L)Bz)w?#8RsR(f}9Lj{lpy|Ke3Z8y3nt?IxTU&Vk{X-|Aox0Ue&hd zd7|!Op+wNnL2T`p#YBVnoJ!xx@QNBeQeW0R&71SFI-Eg&zmxyH$j~`#mZaMgKl@8< zMf}~N1TPB7=k7h&OZ#88Z^~YDu9a#!Nz3tKT^8PqKs&xgXPz-v!^p3SY79fg=2?m!a2W5H48*`o4E5n zfmaair-b7|2lJsc^}FtOpI0@lw}#8be0jSo5!bnuqGNsZtkDC*{?9ANkH-yIjg0O) z0x?aMG)Fu0f~?K0Bxe(3lk$Vxk66g;*_b}pR_eu}@j}giP0)!z*D!>y62R4_XEil| z3t|Lbcz-`9w-|X&?EA!XlK%qkDrtU;!l^2xqJ3y);4g7{UFXs=ALumSV>jdIzQgr% zjoW=tq;k8(q3yFdDlMPJ?=`RsXNsw0R8RYdTuo|*=ft47_i)rR?~l??j@@EM)--8*|^pZ ztr}nXrkwU_VhPArtxBZrVPc#b>yO>BT0Q!Cc|9=MCs1nC(B)!!&nvk0FcY$>n1V zOXackJAF3+FzQ!qcK8$kSeJs$vzCUihmwLw0mRX|1n=daADn(%Huql?Yu+CtZTu(? z_8#%3jcY8BG6meA%~X}NpXCrE9knfoh~}+Qw_cT3kM97%SA{*%-Tg=_wEYgDi`*a9 z-}7MOtLQEY92f2dZKoa*L?h4zyWjZ%CfVIN2S1M4NnaX>tvStJ!(i#>i^I7=AW5Q0 z@84AmwleqoNT?>B)o?5?&k?vICeetjd(lIqL4^PW=@I1F5CRZ9uE5Ove)P|G27U5oVm2rv!DO4f>rcs`bbN&;|v@$!~2%=*eVfX z(xj!Wo*ulLUbp_*<#BFR7L@HyUd4fLu$1HoZDtsrZoXC)-QN9aw>)HNRP+kl)W(CI z^RizGk$ijOeU{2;T|Wyp#J9OC?C9|=33yV}e4E9bRfw;2=9d%Q{M+9NQg<5p+QpaC zF6SOr|0Y`t@JzL99^-iH!i>6H$l+CK|8@G~(&_jyz55sBMPAZZTc1YWL&b}kcW_``!T$wjdY*6~3ft>d2Pk&2(#`X6iZRGC#TadK0XPYt~n z0@S;swmmNmfHZDx$}DQWAAuaoo|K2&h4PLG>`l|d*moSELp`wqS76Rh_O@pX$mG)- zH!$nm%=3ax|BzBMaGT>b9F!5KlDP|=3aY~2)Lz284^fDJh^^j`%}Dt^-YRB?%nPoe zj<ymx+P0^6TRrkqLKDg=?Q;08YE+u+MF}(H|Br)H~Np!8q>+^c02TLp=)c zjXPJwHNDSpHfSz88>u~l{;Vv2LN#&pVACl zj6z*<_GAbyV$fWIUKmWcT4vXsj`w+68}F2p*k>db0COGoEd+i60sKwIx_;9*#xTuHCG+s6n&Gr@DtWfzB7Tx-!ZZo|*_ zn}M;8v*XG3 z^O=z=c3e&^e7cxd`BTH!Lh56)Nif0ft-Y^w>4Tdi91L!nY)Y=JW~!-r8bQ6ch$|Da zL8EU$&|_PJ$>_;-Kd_4k9-O^S`mOiSdcf!bdY~nF>?%F~awsE5!jp*(go)PcLUOeh zvs{_49|aMX>$Thi8XTbGS`R7&*}25Dm56C~l`OoE`wqQ>x+8G7`Q8;nMSt;2(!9>kl9j9fp5pg}OCoSS_FVWykfk^v3moZ3(xg zEgn+iCZ}P!`QyOA3@<2$t=p`gu5N22_Z6dBS3wKv-ml>rga8U9&!yN*rR*O}vZ&CV zS5e9S^huSr5sjH373AhJwqt71C(39-VoAjCRpIzse}L+Hv?kB5)J;BRrR=HQu8owb zvR?OHLzm#~mtryZ^2t^kX8~ru6rxB$kQb=k58AG8k%cP5^Y?9ZvKYap=)CQ?kDo3lM8iMDbguX{(-MMrP}?Yz9s_Eg$! z(c^ychClta{FtVnDY0mw?9HtEWsHxHD@gx*$G#skd#tU6b+V7{(Dr6gp> zUh5t0P35l7PKQ}E+|mOGrpPx7_RLsCdKu0ZFSpUKS3jGFMN@!Hhe0<9bo~_(V zLt04wc5D43a7a58k-Hm&47W`h_w}nKdqrDW=C2;I@{TEak*VcBkT-j4z&{Y{OKN2$ zq`SMDXG;+Zg8{tv;ooRN%|FNg3@HB{W6s_<;zSrzA*gI38)0zcZP0~#H~(PHFHgzT ztuxAZK^}D(v1%nzGcMzMvIXr%71j^ZjbK({- zyi}kjboOtnZOzL}>pnG*nxm|E;SrvOz`!+nlNuKfXf)rypd)G+;~}Df7V?Hh8?XCa zhex<;969}xoV2iKKhz%JiBb*H5*Ds(EIHy|tVb8YvqJCigs4Ae`j2q?->q{*q5K8q zcVg?l#q>XRrM2<165_#Acp$CbgOb5r&AB^3iij{-MkM5!#>2-IIFu~}%1TS6s4|rV zTw(iu76Z!UR9}^%wQ4rYFeN!3d~L;x_o(+b^Z!czg*pG_5YynlUi5z`+NWxOzXmY~ U{S>BsDu7N?9j;cUYWeKH07gwlNdN!< literal 0 HcmV?d00001 diff --git a/docs/.gitbook/assets/basic-architecture-diagram (2).svg b/docs/.gitbook/assets/basic-architecture-diagram (2).svg new file mode 100644 index 0000000000..b707f49046 --- /dev/null +++ b/docs/.gitbook/assets/basic-architecture-diagram (2).svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/.gitbook/assets/basic-architecture-diagram (3).svg b/docs/.gitbook/assets/basic-architecture-diagram (3).svg new file mode 100644 index 0000000000..b707f49046 --- /dev/null +++ b/docs/.gitbook/assets/basic-architecture-diagram (3).svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/.gitbook/assets/feast-docs-overview-diagram-2 (2).svg b/docs/.gitbook/assets/feast-docs-overview-diagram-2 (2).svg new file mode 100644 index 0000000000..7f30963ec7 --- /dev/null +++ b/docs/.gitbook/assets/feast-docs-overview-diagram-2 (2).svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/.gitbook/assets/feast-docs-overview-diagram-2 (3).svg b/docs/.gitbook/assets/feast-docs-overview-diagram-2 (3).svg new file mode 100644 index 0000000000..7f30963ec7 --- /dev/null +++ b/docs/.gitbook/assets/feast-docs-overview-diagram-2 (3).svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/README.md b/docs/README.md index 9593e9f26e..aa0be6090d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,4 +1,4 @@ -# What is Feast? +# Introduction Feast \(**Fea**ture **St**ore\) is a tool for managing and serving machine learning features. @@ -13,7 +13,7 @@ Feast aims to: ![](.gitbook/assets/feast-docs-overview-diagram-2.svg) -**TL;DR:** Feast decouples feature engineering from feature usage. Features that are added to Feast become available immediately for training and serving. Models can retrieve the same features used in training from a low latency online store in production. +Feast decouples feature engineering from feature usage. Features that are added to Feast become available immediately for training and serving. Models can retrieve the same features used in training from a low latency online store in production. This means that new ML projects start with a process of feature selection from a catalog instead of having to do feature engineering from scratch. diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 345b49fe0d..f6139c9021 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -1,27 +1,33 @@ # Table of contents -* [What is Feast?](README.md) - -## Introduction - -* [Why Feast?](introduction/why-feast.md) -* [Getting Help](introduction/getting-help.md) -* [Roadmap](introduction/roadmap.md) +* [Introduction](README.md) +* [Why Feast?](why-feast.md) +* [Getting Started](getting-started/README.md) + * [Deploying Feast](getting-started/deploying-feast/README.md) + * [Docker Compose](getting-started/deploying-feast/docker-compose.md) + * [Kubernetes \(GKE\)](getting-started/deploying-feast/kubernetes.md) + * [Connecting to Feast](getting-started/connecting-to-feast-1/README.md) + * [Python SDK](getting-started/connecting-to-feast-1/python-sdk.md) + * [Feast CLI](getting-started/connecting-to-feast-1/connecting-to-feast.md) +* [Roadmap](roadmap.md) * [Changelog](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md) +* [Community](getting-help.md) -## Concepts - -* [Concepts](concepts/concepts.md) - -## Installation +## User Guide -* [Overview](installation/overview.md) -* [Docker Compose](installation/docker-compose.md) -* [Google Kubernetes Engine \(GKE\)](installation/gke.md) +* [Concepts](user-guide/overview.md) +* [Architecture](user-guide/architecture.md) +* [Feature Sets](user-guide/feature-sets.md) +* [Entities](user-guide/entities.md) +* [Features](user-guide/features.md) +* [Sources](user-guide/sources.md) +* [Data ingestion](user-guide/data-ingestion.md) +* [Stores](user-guide/stores.md) +* [Feature retrieval](user-guide/feature-retrieval.md) ## Tutorials -* [Basic](https://github.com/feast-dev/feast/blob/master/examples/basic/basic.ipynb) +* [Basic Tutorial](https://github.com/feast-dev/feast/blob/master/examples/basic/basic.ipynb) * [Churn Prediction \(XGBoost\)](https://github.com/feast-dev/feast/blob/master/examples/feast-xgboost-churn-prediction-tutorial/Telecom%20Customer%20Churn%20Prediction%20%28with%20Feast%20and%20XGBoost%29.ipynb) ## Administration @@ -30,11 +36,13 @@ ## Reference -* [Python SDK](https://api.docs.feast.dev/python/) -* [Go SDK](https://godoc.org/github.com/feast-dev/feast/sdk/go) -* [gRPC Types](https://api.docs.feast.dev/grpc/feast.types.pb.html) -* [Core gRPC API](https://api.docs.feast.dev/grpc/feast.core.pb.html) -* [Serving gRPC API](https://api.docs.feast.dev/grpc/feast.serving.pb.html) +* [Configuration Reference](reference/configuration-reference.md) +* [API](reference/api/README.md) + * [Core gRPC API](https://api.docs.feast.dev/grpc/feast.core.pb.html) + * [Serving gRPC API](https://api.docs.feast.dev/grpc/feast.serving.pb.html) + * [gRPC Types](https://api.docs.feast.dev/grpc/feast.types.pb.html) + * [Go SDK](https://godoc.org/github.com/feast-dev/feast/sdk/go) + * [Python SDK](https://api.docs.feast.dev/python/) ## Contributing @@ -42,4 +50,5 @@ * [Development Guide](contributing/development-guide.md) * [Style Guide](contributing/style-guide.md) * [Release Process](contributing/release-process.md) +* [Adding a New Store](contributing/adding-a-new-store.md) diff --git a/docs/.gitbook/assets/feast_logo.png b/docs/assets/feast_logo.png similarity index 100% rename from docs/.gitbook/assets/feast_logo.png rename to docs/assets/feast_logo.png diff --git a/docs/contributing/adding-a-new-store-1.md b/docs/contributing/adding-a-new-store-1.md new file mode 100644 index 0000000000..56c06b9b4b --- /dev/null +++ b/docs/contributing/adding-a-new-store-1.md @@ -0,0 +1,87 @@ +# Adding a New Store + +The following guide will explain the process of adding a new store through the introduction of a storage connector. + +## 1. Storage API + +Feast has an external module where storage interfaces are defined: [Storage API](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api) + +Feast interacts with a store at three points . + +1. **During initialization:** Store configuration is loaded into memory by Feast Serving and synchronized with Feast Core +2. **During ingestion of feature data.** [writer interfaces](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api/writer) are used by the Apache Beam ingestion jobs in order to populate stores \(historical or online\). +3. **During retrieval of feature data:** [Retrieval interfaces](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api/retriever) are used by Feast Serving in order to read data from stores in order to create training datasets or to serve online data. + +All three of these components should be implemented in order to have a complete storage connector. + +## 2. Adding a Storage Connector + +### 2.1 Initialization and configuration + +Stores are configured in Feast Serving. Feast Serving publishes its store configuration to Feast Core, after which Feast Core can start ingestion/population jobs to populate it. + +Store configuration is always in the form of a map<String, String>. The keys and configuration for stores are defined in [protos](https://github.com/gojek/feast/blob/master/protos/feast/core/Store.proto). This must be added in order to define a new store + +Then the store must be configured to be loaded through Feast Serving. The above configuration is loaded through [FeastProperties.java](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/serving/src/main/java/feast/serving/config/FeastProperties.java#L175). + +Once configuration is loaded, the store will then be instantiated. + +* Feast Core: The [StoreUtil.java](https://github.com/gojek/feast/blob/master/ingestion/src/main/java/feast/ingestion/utils/StoreUtil.java#L85) instantiates new stores for the purposes of feature ingestion. +* Feast Serving: The [ServingServiceConfig](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/serving/src/main/java/feast/serving/config/ServingServiceConfig.java#L56) instantiates new stores for the purposes of retrieval + +{% hint style="info" %} +In the future we plan to provide a plugin interface for adding stores. +{% endhint %} + +### 2.2 Feature Ingestion \(Writer\) + +Feast creates and manages ingestion/population jobs that stream in data from upstream data sources. Currently Feast only supports Kafka as a data source, meaning these jobs are all long running. Batch ingestion \(from users\) results in data being pushed to Kafka topics after which they are picked up by these "population" jobs and written to stores. + +In order for ingestion to succeed, the destination store must be writable. This means that Feast must be able to create the appropriate tables/schemas in the store and also write data from the population job into the store. + +Currently Feast Core starts and manages these population jobs that ingest data into stores \(although we are planning to move this responsibility to the serving layer\). Feast Core starts an [Apache Beam](https://beam.apache.org/) job which synchronously runs migrations on the destination store and subsequently starts consuming [FeatureRows](https://github.com/gojek/feast/blob/master/protos/feast/types/FeatureRow.proto) from Kafka and writing it into stores using a [writer](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api/writer). + +Below is a "happy path" of a batch ingestion process which includes a blocking step at the Python SDK. + +![](https://user-images.githubusercontent.com/6728866/74807906-91e73c00-5324-11ea-8ba5-2b43c7c5282b.png) + + + +The complete ingestion flow is executed by a [FeatureSink](https://github.com/gojek/feast/blob/master/storage/api/src/main/java/feast/storage/api/writer/FeatureSink.java). Two methods should be implemented + +* [prepareWrite\(\)](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/storage/api/src/main/java/feast/storage/api/writer/FeatureSink.java#L45): Sets up storage backend for writing/ingestion. This method will be called once during pipeline initialisation. Typically this is used to apply schemas. +* [writer\(\)](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/storage/api/src/main/java/feast/storage/api/writer/FeatureSink.java#L53): Retrieves an Apache Beam PTransform that is used to write data to this store. + +### 2.2 Feature Serving \(Retriever\) + +Feast Serving can serve both historical/batch features and online features. Depending on the store that is being added, you should implement either a historical/batch store or an online storage. + +#### 2.2.1 Historical Serving + +The historical serving interface is defined through the [HistoricalRetriever](https://github.com/gojek/feast/blob/master/storage/api/src/main/java/feast/storage/api/retriever/HistoricalRetriever.java) interface. Historical retrieval is an asynchronous process. The client submits a request for a dataset to be produced, and polls until it is ready. + +![High-level flow for batch retrieval](https://user-images.githubusercontent.com/6728866/74797157-702a8c80-5305-11ea-8901-bf6f4eb075f9.png) + +The current implementation of batch retrieval starts and ends with a file \(dataset\) in a Google Cloud Storage bucket. The user ingests an entity dataset. This dataset is loaded into a store \(BigQuery0, joined to features in a point-in-time correct way, then exported again to the bucket. + +Additionally, we have also implemented a [batch retrieval method ](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/sdk/python/feast/client.py#L509)in the Python SDK. Depending on the means through which this new store will export data, this client may have to change. At the very least it would change if Google Cloud Storage isn't used as the staging bucket. + +The means through which you implement the export/import of data into the store will depend on your store. + +#### 2.2.2 Online Serving + +In the case of online serving it is necessary to implement an [OnlineRetriever](https://github.com/gojek/feast/blob/master/storage/api/src/main/java/feast/storage/api/retriever/OnlineRetriever.java). This online retriever will read rows directly and synchronously from an online database. The exact encoding strategy you use to store your data in the store would be defined in the FeatureSink. The OnlineRetriever is expected to read and decode those rows. + +## 3. Storage Connectors Examples + +Feast currently provides support for the following storage types + +Historical storage + +* [BigQuery](https://github.com/gojek/feast/tree/master/storage/connectors/bigquery) + +Online storage + +* [Redis](https://github.com/gojek/feast/tree/master/storage/connectors/redis) +* [Redis Cluster](https://github.com/gojek/feast/tree/master/storage/connectors/rediscluster) + diff --git a/docs/contributing/adding-a-new-store.md b/docs/contributing/adding-a-new-store.md new file mode 100644 index 0000000000..56c06b9b4b --- /dev/null +++ b/docs/contributing/adding-a-new-store.md @@ -0,0 +1,87 @@ +# Adding a New Store + +The following guide will explain the process of adding a new store through the introduction of a storage connector. + +## 1. Storage API + +Feast has an external module where storage interfaces are defined: [Storage API](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api) + +Feast interacts with a store at three points . + +1. **During initialization:** Store configuration is loaded into memory by Feast Serving and synchronized with Feast Core +2. **During ingestion of feature data.** [writer interfaces](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api/writer) are used by the Apache Beam ingestion jobs in order to populate stores \(historical or online\). +3. **During retrieval of feature data:** [Retrieval interfaces](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api/retriever) are used by Feast Serving in order to read data from stores in order to create training datasets or to serve online data. + +All three of these components should be implemented in order to have a complete storage connector. + +## 2. Adding a Storage Connector + +### 2.1 Initialization and configuration + +Stores are configured in Feast Serving. Feast Serving publishes its store configuration to Feast Core, after which Feast Core can start ingestion/population jobs to populate it. + +Store configuration is always in the form of a map<String, String>. The keys and configuration for stores are defined in [protos](https://github.com/gojek/feast/blob/master/protos/feast/core/Store.proto). This must be added in order to define a new store + +Then the store must be configured to be loaded through Feast Serving. The above configuration is loaded through [FeastProperties.java](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/serving/src/main/java/feast/serving/config/FeastProperties.java#L175). + +Once configuration is loaded, the store will then be instantiated. + +* Feast Core: The [StoreUtil.java](https://github.com/gojek/feast/blob/master/ingestion/src/main/java/feast/ingestion/utils/StoreUtil.java#L85) instantiates new stores for the purposes of feature ingestion. +* Feast Serving: The [ServingServiceConfig](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/serving/src/main/java/feast/serving/config/ServingServiceConfig.java#L56) instantiates new stores for the purposes of retrieval + +{% hint style="info" %} +In the future we plan to provide a plugin interface for adding stores. +{% endhint %} + +### 2.2 Feature Ingestion \(Writer\) + +Feast creates and manages ingestion/population jobs that stream in data from upstream data sources. Currently Feast only supports Kafka as a data source, meaning these jobs are all long running. Batch ingestion \(from users\) results in data being pushed to Kafka topics after which they are picked up by these "population" jobs and written to stores. + +In order for ingestion to succeed, the destination store must be writable. This means that Feast must be able to create the appropriate tables/schemas in the store and also write data from the population job into the store. + +Currently Feast Core starts and manages these population jobs that ingest data into stores \(although we are planning to move this responsibility to the serving layer\). Feast Core starts an [Apache Beam](https://beam.apache.org/) job which synchronously runs migrations on the destination store and subsequently starts consuming [FeatureRows](https://github.com/gojek/feast/blob/master/protos/feast/types/FeatureRow.proto) from Kafka and writing it into stores using a [writer](https://github.com/gojek/feast/tree/master/storage/api/src/main/java/feast/storage/api/writer). + +Below is a "happy path" of a batch ingestion process which includes a blocking step at the Python SDK. + +![](https://user-images.githubusercontent.com/6728866/74807906-91e73c00-5324-11ea-8ba5-2b43c7c5282b.png) + + + +The complete ingestion flow is executed by a [FeatureSink](https://github.com/gojek/feast/blob/master/storage/api/src/main/java/feast/storage/api/writer/FeatureSink.java). Two methods should be implemented + +* [prepareWrite\(\)](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/storage/api/src/main/java/feast/storage/api/writer/FeatureSink.java#L45): Sets up storage backend for writing/ingestion. This method will be called once during pipeline initialisation. Typically this is used to apply schemas. +* [writer\(\)](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/storage/api/src/main/java/feast/storage/api/writer/FeatureSink.java#L53): Retrieves an Apache Beam PTransform that is used to write data to this store. + +### 2.2 Feature Serving \(Retriever\) + +Feast Serving can serve both historical/batch features and online features. Depending on the store that is being added, you should implement either a historical/batch store or an online storage. + +#### 2.2.1 Historical Serving + +The historical serving interface is defined through the [HistoricalRetriever](https://github.com/gojek/feast/blob/master/storage/api/src/main/java/feast/storage/api/retriever/HistoricalRetriever.java) interface. Historical retrieval is an asynchronous process. The client submits a request for a dataset to be produced, and polls until it is ready. + +![High-level flow for batch retrieval](https://user-images.githubusercontent.com/6728866/74797157-702a8c80-5305-11ea-8901-bf6f4eb075f9.png) + +The current implementation of batch retrieval starts and ends with a file \(dataset\) in a Google Cloud Storage bucket. The user ingests an entity dataset. This dataset is loaded into a store \(BigQuery0, joined to features in a point-in-time correct way, then exported again to the bucket. + +Additionally, we have also implemented a [batch retrieval method ](https://github.com/gojek/feast/blob/a1937c374a4e39b7a75d828e7b7c3b87a64d9d6e/sdk/python/feast/client.py#L509)in the Python SDK. Depending on the means through which this new store will export data, this client may have to change. At the very least it would change if Google Cloud Storage isn't used as the staging bucket. + +The means through which you implement the export/import of data into the store will depend on your store. + +#### 2.2.2 Online Serving + +In the case of online serving it is necessary to implement an [OnlineRetriever](https://github.com/gojek/feast/blob/master/storage/api/src/main/java/feast/storage/api/retriever/OnlineRetriever.java). This online retriever will read rows directly and synchronously from an online database. The exact encoding strategy you use to store your data in the store would be defined in the FeatureSink. The OnlineRetriever is expected to read and decode those rows. + +## 3. Storage Connectors Examples + +Feast currently provides support for the following storage types + +Historical storage + +* [BigQuery](https://github.com/gojek/feast/tree/master/storage/connectors/bigquery) + +Online storage + +* [Redis](https://github.com/gojek/feast/tree/master/storage/connectors/redis) +* [Redis Cluster](https://github.com/gojek/feast/tree/master/storage/connectors/rediscluster) + diff --git a/docs/contributing/contributing.md b/docs/contributing/contributing.md index 805721e3d2..2be097d6ff 100644 --- a/docs/contributing/contributing.md +++ b/docs/contributing/contributing.md @@ -1,12 +1,10 @@ # Contribution Process -## 1. Contribution process +We use [RFCs](https://en.wikipedia.org/wiki/Request_for_Comments) and [GitHub issues](https://github.com/gojek/feast/issues) to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our [RFCs](https://drive.google.com/drive/u/0/folders/1Lj1nIeRB868oZvKTPLYqAvKQ4O0BksjY) in the [Feast Google Drive](https://drive.google.com/drive/u/0/folders/0AAe8j7ZK3sxSUk9PVA) or our GitHub issues. You will need to join our [Google Group](../getting-help.md) in order to get access. -We use [RFCs](https://en.wikipedia.org/wiki/Request_for_Comments) and [GitHub issues](https://github.com/feast-dev/feast/issues) to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our [RFCs](https://drive.google.com/drive/u/0/folders/1Lj1nIeRB868oZvKTPLYqAvKQ4O0BksjY) in the [Feast Google Drive](https://drive.google.com/drive/u/0/folders/0AAe8j7ZK3sxSUk9PVA) or our GitHub issues. +We follow a process of [lazy consensus](http://community.apache.org/committers/lazyConsensus.html). If you believe you know what the project needs then just start development. If you are unsure about which direction to take with development then please communicate your ideas through a GitHub issue or through our [Slack Channel](../getting-help.md) before starting development. -Please communicate your ideas through a GitHub issue or through our Slack Channel before starting development. - -Please [submit a PR ](https://github.com/feast-dev/feast/pulls)to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast \(including submission from project maintainers\) require review and approval from maintainers or code owners. +Please [submit a PR ](https://github.com/gojek/feast/pulls)to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast \(including submission from project maintainers\) require review and approval from maintainers or code owners. PRs that are submitted by the general public need to be identified as `ok-to-test`. Once enabled, [Prow](https://github.com/kubernetes/test-infra/tree/master/prow) will run a range of tests to verify the submission, after which community members will help to review the pull request. @@ -14,514 +12,3 @@ PRs that are submitted by the general public need to be identified as `ok-to-tes Please sign the [Google CLA](https://cla.developers.google.com/) in order to have your code merged into the Feast repository. {% endhint %} -## 2. Development guide - -### 2.1 Overview - -The following guide will help you quickly run Feast in your local machine. - -The main components of Feast are: - -* **Feast Core:** Handles feature registration, starts and manages ingestion jobs and ensures that Feast internal metadata is consistent. -* **Feast Ingestion Jobs:** Subscribes to streams of FeatureRows and writes these as feature - - values to registered databases \(online, historical\) that can be read by Feast Serving. - -* **Feast Serving:** Service that handles requests for features values, either online or batch. - -### 2.**2 Requirements** - -#### 2.**2.1 Development environment** - -The following software is required for Feast development - -* Java SE Development Kit 11 -* Python version 3.6 \(or above\) and pip -* [Maven ](https://maven.apache.org/install.html)version 3.6.x - -Additionally, [grpc\_cli](https://github.com/grpc/grpc/blob/master/doc/command_line_tool.md) is useful for debugging and quick testing of gRPC endpoints. - -#### 2.**2.2 Services** - -The following components/services are required to develop Feast: - -* **Feast Core:** Requires PostgreSQL \(version 11 and above\) to store state, and requires a Kafka \(tested on version 2.x\) setup to allow for ingestion of FeatureRows. -* **Feast Serving:** Requires Redis \(tested on version 5.x\). - -These services should be running before starting development. The following snippet will start the services using Docker. - -```bash -# Start Postgres -docker run --name postgres --rm -it -d --net host -e POSTGRES_DB=postgres -e POSTGRES_USER=postgres \ --e POSTGRES_PASSWORD=password postgres:12-alpine - -# Start Redis -docker run --name redis --rm -it --net host -d redis:5-alpine - -# Start Zookeeper (needed by Kafka) -docker run --rm \ - --net=host \ - --name=zookeeper \ - --env=ZOOKEEPER_CLIENT_PORT=2181 \ - --detach confluentinc/cp-zookeeper:5.2.1 - -# Start Kafka -docker run --rm \ - --net=host \ - --name=kafka \ - --env=KAFKA_ZOOKEEPER_CONNECT=localhost:2181 \ - --env=KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \ - --env=KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \ - --detach confluentinc/cp-kafka:5.2.1 -``` - -### 2.3 Testing and development - -#### 2.3.1 Running unit tests - -```text -$ mvn test -``` - -#### 2.3.2 Running integration tests - -_Note: integration suite isn't yet separated from unit._ - -```text -$ mvn verify -``` - -#### 2.3.3 Running components locally - -The `core` and `serving` modules are Spring Boot applications. These may be run as usual for [the Spring Boot Maven plugin](https://docs.spring.io/spring-boot/docs/current/maven-plugin/index.html): - -```text -$ mvn --projects core spring-boot:run - -# Or for short: -$ mvn -pl core spring-boot:run -``` - -Note that you should execute `mvn` from the Feast repository root directory, as there are intermodule dependencies that Maven will not resolve if you `cd` to subdirectories to run. - -#### 2.3.4 Running from IntelliJ - -Compiling and running tests in IntelliJ should work as usual. - -Running the Spring Boot apps may work out of the box in IDEA Ultimate, which has built-in support for Spring Boot projects, but the Community Edition needs a bit of help: - -The Spring Boot Maven plugin automatically puts dependencies with `provided` scope on the runtime classpath when using `spring-boot:run`, such as its embedded Tomcat server. The "Play" buttons in the gutter or right-click menu of a `main()` method [do not do this](https://stackoverflow.com/questions/30237768/run-spring-boots-main-using-ide). - -A solution to this is: - -1. Open `View > Tool Windows > Maven` -2. Drill down to e.g. `Feast Core > Plugins > spring-boot:run`, right-click and `Create 'feast-core [spring-boot'…` -3. In the dialog that pops up, check the `Resolve Workspace artifacts` box -4. Recommended: add `-Dspring-boot.run.fork=false` to the `Command line` field to get Debug working too -5. Click `OK`. You should now be able to select this run configuration for the Play button in the main toolbar, keyboard shortcuts, etc. - -It is recommend to have IntelliJ delegate building to Maven, if this is not enabled out of the box when you import the project, for greater assurance that build behavior is consistent with CI / production builds. This is set in Preferences at `Build, Execution, Deployment > Build Tools > Maven > Runner > Delegate IDE build/run actions to Maven`. - -### 2.**4** Validating your setup - -The following section is a quick walk-through to test whether your local Feast deployment is functional for development purposes. - -**2.4.1 Assumptions** - -* PostgreSQL is running in `localhost:5432` and has a database called `postgres` which - - can be accessed with credentials user `postgres` and password `password`. Different database configurations can be supplied here \(`/core/src/main/resources/application.yml`\) - -* Redis is running locally and accessible from `localhost:6379` -* \(optional\) The local environment has been authentication with Google Cloud Platform and has full access to BigQuery. This is only necessary for BigQuery testing/development. - -#### 2.4.2 Clone Feast - -```bash -git clone https://github.com/feast-dev/feast.git && cd feast && \ -export FEAST_HOME_DIR=$(pwd) -``` - -#### 2.4.3 Starting Feast Core - -To run Feast Core locally using Maven: - -```bash -# Feast Core can be configured from the following .yml file -# $FEAST_HOME_DIR/core/src/main/resources/application.yml -mvn --projects core spring-boot:run -``` - -Test whether Feast Core is running - -```text -grpc_cli call localhost:6565 ListStores '' -``` - -The output should list **no** stores since no Feast Serving has registered its stores to Feast Core: - -```text -connecting to localhost:6565 - -Rpc succeeded with OK status -``` - -#### 2.4.4 Starting Feast Serving - -Feast Serving is configured through the `$FEAST_HOME_DIR/serving/src/main/resources/application.yml`. Each Serving deployment must be configured with a store. The default store is Redis \(used for online serving\). - -The configuration for this default store is located in a separate `.yml` file. The default location is `$FEAST_HOME_DIR/serving/sample_redis_config.yml`: - -```text -name: serving -type: REDIS -redis_config: - host: localhost - port: 6379 -subscriptions: - - name: "*" - project: "*" - version: "*" -``` - -Once Feast Serving is started, it will register its store with Feast Core \(by name\) and start to subscribe to a feature sets based on its subscription. - -Start Feast Serving GRPC server on localhost:6566 with store name `serving` - -```text -mvn --projects serving spring-boot:run -``` - -Test connectivity to Feast Serving - -```text -grpc_cli call localhost:6566 GetFeastServingInfo '' -``` - -```text -connecting to localhost:6566 -version: "0.4.2-SNAPSHOT" -type: FEAST_SERVING_TYPE_ONLINE - -Rpc succeeded with OK status -``` - -Test Feast Core to see whether it is aware of the Feast Serving deployment - -```text -grpc_cli call localhost:6565 ListStores '' -``` - -```text -connecting to localhost:6565 -store { - name: "serving" - type: REDIS - subscriptions { - name: "*" - version: "*" - project: "*" - } - redis_config { - host: "localhost" - port: 6379 - } -} - -Rpc succeeded with OK status -``` - -In order to use BigQuery as a historical store, it is necessary to start Feast Serving with a different store type. - -Copy `$FEAST_HOME_DIR/serving/sample_redis_config.yml` to the following location `$FEAST_HOME_DIR/serving/my_bigquery_config.yml` and update the configuration as below: - -```text -name: bigquery -type: BIGQUERY -bigquery_config: - project_id: YOUR_GCP_PROJECT_ID - dataset_id: YOUR_GCP_DATASET -subscriptions: - - name: "*" - version: "*" - project: "*" -``` - -Then inside `serving/src/main/resources/application.yml` modify the following key `feast.store.config-path` to point to the new store configuration. - -After making these changes, restart Feast Serving: - -```text -mvn --projects serving spring-boot:run -``` - -You should see two stores registered: - -```text -store { - name: "serving" - type: REDIS - subscriptions { - name: "*" - version: "*" - project: "*" - } - redis_config { - host: "localhost" - port: 6379 - } -} -store { - name: "bigquery" - type: BIGQUERY - subscriptions { - name: "*" - version: "*" - project: "*" - } - bigquery_config { - project_id: "my_project" - dataset_id: "my_bq_dataset" - } -} -``` - -#### 2.4.5 Registering a FeatureSet - -Before registering a new FeatureSet, a project is required. - -```text -grpc_cli call localhost:6565 CreateProject ' - name: "your_project_name" -' -``` - -When a feature set is successfully registered, Feast Core will start an **ingestion** job that listens for new features in the feature set. - -{% hint style="info" %} -Note that Feast currently only supports source of type `KAFKA`, so you must have access to a running Kafka broker to register a FeatureSet successfully. It is possible to omit the `source` from a Feature Set, but Feast Core will still use Kafka behind the scenes, it is simply abstracted away from the user. -{% endhint %} - -Create a new FeatureSet in Feast by sending a request to Feast Core: - -```text -# Example of registering a new driver feature set -# Note the source value, it assumes that you have access to a Kafka broker -# running on localhost:9092 - -grpc_cli call localhost:6565 ApplyFeatureSet ' -feature_set { - spec { - project: "your_project_name" - name: "driver" - version: 1 - - entities { - name: "driver_id" - value_type: INT64 - } - - features { - name: "city" - value_type: STRING - } - - source { - type: KAFKA - kafka_source_config { - bootstrap_servers: "localhost:9092" - topic: "your-kafka-topic" - } - } - } -} -' -``` - -Verify that the FeatureSet has been registered correctly. - -```text -# To check that the FeatureSet has been registered correctly. -# You should also see logs from Feast Core of the ingestion job being started -grpc_cli call localhost:6565 GetFeatureSet ' - project: "your_project_name" - name: "driver" -' -``` - -Or alternatively, list all feature sets - -```text -grpc_cli call localhost:6565 ListFeatureSets ' - filter { - project: "your_project_name" - feature_set_name: "driver" - feature_set_version: "1" - } -' -``` - -#### 2.4.6 Ingestion and Population of Feature Values - -```text -# Produce FeatureRow messages to Kafka so it will be ingested by Feast -# and written to the registered stores. -# Make sure the value here is the topic assigned to the feature set -# ... producer.send("feast-driver-features" ...) -# -# Install Python SDK to help writing FeatureRow messages to Kafka -cd $FEAST_HOMEDIR/sdk/python -pip3 install -e . -pip3 install pendulum - -# Produce FeatureRow messages to Kafka so it will be ingested by Feast -# and written to the corresponding store. -# Make sure the value here is the topic assigned to the feature set -# ... producer.send("feast-test_feature_set-features" ...) -python3 - <` property, and commit. -1. Push. For a new release branch, open a PR against master. -1. When CI passes, merge. (Remember _not_ to delete the new release branch). -1. Tag the merge commit with the release version, using a `v` prefix. Push the tag. -1. Bump to the next working version and append `-SNAPSHOT` in `pom.xml`. -1. Commit the POM and open a PR. -1. Create a [GitHub release](https://github.com/feast-dev/feast/releases) which includes a summary of important changes as well as any artifacts associated with the release. Make sure to include the same change log as added in [CHANGELOG.md]. Use `Feast vX.Y.Z` as the title. -1. Create one final PR to the master branch and also update its [CHANGELOG.md]. +2. Update the [CHANGELOG.md](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md). See the [Creating a change log](release-process.md#creating-a-change-log) guide. +3. In the root `pom.xml`, remove `-SNAPSHOT` from the `` property, and commit. +4. Push. For a new release branch, open a PR against master. +5. When CI passes, merge. \(Remember _not_ to delete the new release branch\). +6. Tag the merge commit with the release version, using a `v` prefix. Push the tag. +7. Bump to the next working version and append `-SNAPSHOT` in `pom.xml`. +8. Commit the POM and open a PR. +9. Create a [GitHub release](https://github.com/feast-dev/feast/releases) which includes a summary of important changes as well as any artifacts associated with the release. Make sure to include the same change log as added in [CHANGELOG.md](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md). Use `Feast vX.Y.Z` as the title. +10. Create one final PR to the master branch and also update its [CHANGELOG.md](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md). When a tag that matches a Semantic Version string is pushed, CI will automatically build and push the relevant artifacts to their repositories or package managers \(docker images, Python wheels, etc\). JVM artifacts are promoted from Sonatype OSSRH to Maven Central, but it sometimes takes some time for them to be available. -[CHANGELOG.md]: https://github.com/feast-dev/feast/blob/master/CHANGELOG.md - ### Creating a change log -We use an [open source change log generator](https://hub.docker.com/r/ferrarimarco/github-changelog-generator/) to generate change logs. The process still requires a little bit of manual effort. -1. Create a GitHub token as [per these instructions ](https://github.com/github-changelog-generator/github-changelog-generator#github-token). The token is used as an input argument (`-t`) to the changelog generator. -2. The change log generator configuration below will look for unreleased changes on a specific branch. The branch will be `master` for a major/minor release, or a release branch (`v0.4-branch`) for a patch release. You will need to set the branch using the `--release-branch` argument. -3. You should also set the `--future-release` argument. This is the version you are releasing. The version can still be changed at a later date. -4. Update the arguments below and run the command to generate the change log to the console. -``` +We use an [open source change log generator](https://hub.docker.com/r/ferrarimarco/github-changelog-generator/) to generate change logs. The process still requires a little bit of manual effort. 1. Create a GitHub token as [per these instructions ](https://github.com/github-changelog-generator/github-changelog-generator#github-token). The token is used as an input argument \(`-t`\) to the changelog generator. 2. The change log generator configuration below will look for unreleased changes on a specific branch. The branch will be `master` for a major/minor release, or a release branch \(`v0.4-branch`\) for a patch release. You will need to set the branch using the `--release-branch` argument. 3. You should also set the `--future-release` argument. This is the version you are releasing. The version can still be changed at a later date. 4. Update the arguments below and run the command to generate the change log to the console. + +```text docker run -it --rm ferrarimarco/github-changelog-generator \ --user feast-dev \ --project feast \ @@ -55,8 +50,10 @@ docker run -it --rm ferrarimarco/github-changelog-generator \ --max-issues 1 \ -o ``` -5. Review each change log item. - - Make sure that sentences are grammatically correct and well formatted (although we will try to enforce this at the PR review stage). - - Make sure that each item is categorized correctly. You will see the following categories: `Breaking changes`, `Implemented enhancements`, `Fixed bugs`, and `Merged pull requests`. Any unlabeled PRs will be found in `Merged pull requests`. It's important to make sure that any `breaking changes`, `enhancements`, or `bug fixes` are pulled up out of `merged pull requests` into the correct category. Housekeeping, tech debt clearing, infra changes, or refactoring do not count as `enhancements`. Only enhancements a user benefits from should be listed in that category. - - Make sure that the "Full Changelog" link is actually comparing the correct tags (normally your released version against the previously version). - - Make sure that release notes and breaking changes are present. + +1. Review each change log item. + * Make sure that sentences are grammatically correct and well formatted \(although we will try to enforce this at the PR review stage\). + * Make sure that each item is categorized correctly. You will see the following categories: `Breaking changes`, `Implemented enhancements`, `Fixed bugs`, and `Merged pull requests`. Any unlabeled PRs will be found in `Merged pull requests`. It's important to make sure that any `breaking changes`, `enhancements`, or `bug fixes` are pulled up out of `merged pull requests` into the correct category. Housekeeping, tech debt clearing, infra changes, or refactoring do not count as `enhancements`. Only enhancements a user benefits from should be listed in that category. + * Make sure that the "Full Changelog" link is actually comparing the correct tags \(normally your released version against the previously version\). + * Make sure that release notes and breaking changes are present. + diff --git a/docs/contributing/style-guide.md b/docs/contributing/style-guide.md index 322f1e3d03..1cf1c69671 100644 --- a/docs/contributing/style-guide.md +++ b/docs/contributing/style-guide.md @@ -1,6 +1,8 @@ # Style Guide -## 1. Java +## 1. Language Specific Style Guides + +### 1.1 Java We conform to the [Google Java Style Guide](https://google.github.io/styleguide/javaguide.html). Maven can helpfully take care of that for you before you commit: @@ -17,11 +19,25 @@ $ mvn verify -Dspotless.check.skip If you're using IntelliJ, you can import [these code style settings](https://github.com/google/styleguide/blob/gh-pages/intellij-java-google-style.xml) if you'd like to use the IDE's reformat function as you develop. -## 2. Go +### 1.2 Go Make sure you apply `go fmt`. -## 3. Python +### 1.3 Python We use [Python Black](https://github.com/psf/black) to format our Python code prior to submission. +## 2. Formatting and Linting + +Code can automatically be formatted by running the following command from the project root directory + +```text +make format +``` + +Once code that is submitted through a PR or direct push will be validated with the following command + +```text +make lint +``` + diff --git a/docs/getting-help.md b/docs/getting-help.md new file mode 100644 index 0000000000..32b21461b1 --- /dev/null +++ b/docs/getting-help.md @@ -0,0 +1,36 @@ +# Community + +## Chat + +* Come and say hello in [\#Feast](https://join.slack.com/t/kubeflow/shared_invite/zt-cpr020z4-PfcAue_2nw67~iIDy7maAQ) over in the Kubeflow Slack. + +## GitHub + +* Feast's GitHub repo can be [found here](https://github.com/feast-dev/feast/). +* Found a bug or need a feature? [Create an issue on GitHub](https://github.com/feast-dev/feast/issues/new) + +## Community Call + +We have a community call every 2 weeks. Alternating between two times. + +* 11 am \(UTC + 8\) +* 5 pm \(UTC + 8\) + +Please join the [feast-dev](getting-help.md#feast-development) mailing list to receive the the calendar invitation. + +## Mailing list + +### Feast discussion + +* Google Group: [https://groups.google.com/d/forum/feast-discuss](https://groups.google.com/d/forum/feast-discuss) +* Mailing List: [feast-discuss@googlegroups.com](mailto:feast-discuss@googlegroups.com) + +### Feast development + +* Google Group: [https://groups.google.com/d/forum/feast-dev](https://groups.google.com/d/forum/feast-dev) +* Mailing List: [feast-dev@googlegroups.com](mailto:feast-dev@googlegroups.com) + +## Google Drive + +The Feast community also maintains a [Google Drive](https://drive.google.com/drive/u/0/folders/0AAe8j7ZK3sxSUk9PVA) with documents like RFCs, meeting notes, or roadmaps. Please join one of the above mailing lists \(feast-dev or feast-discuss\) to gain access to the drive. + diff --git a/docs/getting-started/README.md b/docs/getting-started/README.md new file mode 100644 index 0000000000..9945ab197d --- /dev/null +++ b/docs/getting-started/README.md @@ -0,0 +1,14 @@ +# Getting Started + +If you want to learn more about Feast concepts and its architecture, see the user guide + +If you would like to connect to an existing Feast deployment then click on `Connecting to Feast` + +{% page-ref page="connecting-to-feast-1/" %} + +If you would like to deploy a new installation of Feast, then click on `Deploying Feast` + +{% page-ref page="deploying-feast/" %} + +If you are connected to a running Feast deployment, then have a look at our [example tutorials](https://github.com/gojek/feast/tree/master/examples). + diff --git a/docs/getting-started/connecting-to-feast-1/README.md b/docs/getting-started/connecting-to-feast-1/README.md new file mode 100644 index 0000000000..7cc7e5d8c5 --- /dev/null +++ b/docs/getting-started/connecting-to-feast-1/README.md @@ -0,0 +1,21 @@ +# Connecting to Feast + +## Python SDK + +* Define, register, and manage entities and features +* Ingest data into Feast +* Build and retrieve training datasets +* Retrieve online features + +{% page-ref page="python-sdk.md" %} + +## Feast CLI + +* Define, register, and manage entities and features from the terminal +* Ingest data into Feast +* Manage ingestion jobs + +{% page-ref page="connecting-to-feast.md" %} + + + diff --git a/docs/getting-started/connecting-to-feast-1/connecting-to-feast.md b/docs/getting-started/connecting-to-feast-1/connecting-to-feast.md new file mode 100644 index 0000000000..a2cafda0db --- /dev/null +++ b/docs/getting-started/connecting-to-feast-1/connecting-to-feast.md @@ -0,0 +1,38 @@ +# Feast CLI + +The Feast CLI is installed through pip: + +```bash +pip install feast +``` + +Configure the CLI to connect to your Feast Core deployment + +```text +feast config set core_url your.feast.deployment +``` + +{% hint style="info" %} +By default, all configuration is stored in `~/.feast/config` +{% endhint %} + +The CLI is a wrapper around the [Feast Python SDK](python-sdk.md) + +```aspnet +$ feast + +Usage: feast [OPTIONS] COMMAND [ARGS]... + +Options: + --help Show this message and exit. + +Commands: + config View and edit Feast properties + feature-sets Create and manage feature sets + ingest Ingest feature data into a feature set + projects Create and manage projects + version Displays version and connectivity information +``` + + + diff --git a/docs/getting-started/connecting-to-feast-1/python-sdk.md b/docs/getting-started/connecting-to-feast-1/python-sdk.md new file mode 100644 index 0000000000..2fea0c31af --- /dev/null +++ b/docs/getting-started/connecting-to-feast-1/python-sdk.md @@ -0,0 +1,20 @@ +# Python SDK + +The Feast SDK can be installed directly using pip: + +```bash +pip install feast +``` + +Users should then be able to connect to a Feast deployment as follows + +```python +from feast import Client + +# Connect to an existing Feast Core deployment +client = Client(core_url='feast.example.com:6565') + +# Ensure that your client is connected by printing out some feature sets +client.list_feature_sets() +``` + diff --git a/docs/getting-started/deploying-feast/README.md b/docs/getting-started/deploying-feast/README.md new file mode 100644 index 0000000000..4a9aac11ba --- /dev/null +++ b/docs/getting-started/deploying-feast/README.md @@ -0,0 +1,16 @@ +# Deploying Feast + +## Docker Compose + +* Fastest way to get Feast up and running. +* Provides a pre-installed Jupyter Notebook with sample code. + +{% page-ref page="docker-compose.md" %} + +## Kubernetes \(GKE\) + +* Recommended way to install Feast for production use. +* The guide has dependencies on BigQuery, and Google Cloud Storage. + +{% page-ref page="kubernetes.md" %} + diff --git a/docs/getting-started/deploying-feast/docker-compose.md b/docs/getting-started/deploying-feast/docker-compose.md new file mode 100644 index 0000000000..a8895a0252 --- /dev/null +++ b/docs/getting-started/deploying-feast/docker-compose.md @@ -0,0 +1,112 @@ +# Docker Compose + +### Overview + +This guide will bring Feast up using Docker Compose. This will allow you to: + +* Create, register, and manage feature sets +* Ingest feature data into Feast +* Retrieve features for online serving +* Retrieve features for batch serving \(only if using Google Cloud Platform\) + +This guide is split into three parts: + +1. Setting up your environment +2. Starting Feast with **online serving support only** \(does not require GCP\). +3. Starting Feast with support for **both online and batch** serving \(requires GCP\) + +{% hint style="info" %} +The docker compose setup uses Direct Runner for the Apache Beam jobs that populate data stores. Running Beam with the Direct Runner means it does not need a dedicated runner like Flink or Dataflow, but this comes at the cost of performance. We recommend the use of a dedicated runner when running Feast with very large workloads. +{% endhint %} + +### 0. Requirements + +* [Docker compose](https://docs.docker.com/compose/install/) must be installed. +* The following list of TCP ports must be free: + * 6565, 6566, 8888, and 9094. + * Alternatively it is possible to modify port mappings in `/docker-compose/docker-compose.yml`. +* \(for batch serving only\) For batch serving you will also need a [GCP service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) that has access to [Google Cloud Storage](https://cloud.google.com/storage) and [BigQuery](https://cloud.google.com/bigquery). +* \(for batch serving only\) [Google Cloud SDK ](https://cloud.google.com/sdk/install)installed, authenticated, and configured to the project you will use. + +## 1. Set up environment + +Clone the [Feast repository](https://github.com/feast-dev/feast/) and navigate to the `docker-compose` sub-directory: + +```bash +git clone https://github.com/feast-dev/feast.git && \ +cd feast && export FEAST_HOME_DIR=$(pwd) && \ +cd infra/docker-compose +``` + +Make a copy of the `.env.sample` file: + +```bash +cp .env.sample .env +``` + +## 2. Docker Compose for Online Serving Only + +### 2.1 Start Feast \(without batch retrieval support\) + +If you do not require batch serving, then its possible to simply bring up Feast: + +```javascript +docker-compose up -d +``` + +A Jupyter Notebook environment is now available to use Feast: + +[http://localhost:8888/tree/feast/examples](http://localhost:8888/tree/feast/examples) + +## 3. Docker Compose for Online and Batch Serving + +{% hint style="info" %} +Batch serving requires Google Cloud Storage to function, specifically Google Cloud Storage \(GCP\) and BigQuery. +{% endhint %} + +### 3.1 Set up Google Cloud Platform + +Create a [service account ](https://cloud.google.com/iam/docs/creating-managing-service-accounts)from the GCP console and copy it to the `infra/docker-compose/gcp-service-accounts` folder: + +```javascript +cp my-service-account.json ${FEAST_HOME_DIR}/infra/docker-compose/gcp-service-accounts +``` + +Create a Google Cloud Storage bucket. Make sure that your service account above has read/write permissions to this bucket: + +```bash +gsutil mb gs://my-feast-staging-bucket +``` + +### 3.2 Configure .env + +Configure the `.env` file based on your environment. At the very least you have to modify: + +| Parameter | Description | +| :--- | :--- | +| FEAST\_CORE\_GCP\_SERVICE\_ACCOUNT\_KEY | This should be your service account file name, for example `key.json`. | +| FEAST\_BATCH\_SERVING\_GCP\_SERVICE\_ACCOUNT\_KEY | This should be your service account file name, for example `key.json` | +| FEAST\_JUPYTER\_GCP\_SERVICE\_ACCOUNT\_KEY | This should be your service account file name, for example `key.json` | +| FEAST\_JOB\_STAGING\_LOCATION | Google Cloud Storage bucket that Feast will use to stage data exports and batch retrieval requests, for example `gs://your-gcs-bucket/staging` | + +### 3.3 Configure .bq-store.yml + +We will also need to configure the `bq-store.yml` file inside `infra/docker-compose/serving/` to configure the BigQuery storage configuration as well as the feature sets that the store subscribes to. At a minimum you will need to set: + +| Parameter | Description | +| :--- | :--- | +| bigquery\_config.project\_id | This is you [GCP project Id](https://cloud.google.com/resource-manager/docs/creating-managing-projects). | +| bigquery\_config.dataset\_id | This is the name of the BigQuery dataset that tables will be created in. Each feature set will have one table in BigQuery. | + +### 3.4 Start Feast \(with batch retrieval support\) + +Start Feast: + +```javascript +docker-compose up -d +``` + +A Jupyter Notebook environment is now available to use Feast: + +[http://localhost:8888/tree/feast/examples](http://localhost:8888/tree/feast/examples) + diff --git a/docs/getting-started/deploying-feast/kubernetes.md b/docs/getting-started/deploying-feast/kubernetes.md new file mode 100644 index 0000000000..a7432836f3 --- /dev/null +++ b/docs/getting-started/deploying-feast/kubernetes.md @@ -0,0 +1,211 @@ +# Kubernetes \(GKE\) + +### Overview + +This guide will install Feast into a Kubernetes cluster on GCP. It assumes that all of your services will run within a single Kubernetes cluster. Once Feast is installed you will be able to: + +* Define and register features. +* Load feature data from both batch and streaming sources. +* Retrieve features for model training. +* Retrieve features for online serving. + +{% hint style="info" %} +This guide requires [Google Cloud Platform](https://cloud.google.com/) for installation. + +* [BigQuery](https://cloud.google.com/bigquery/) is used for storing historical features. +* [Google Cloud Storage](https://cloud.google.com/storage/) is used for intermediate data storage. +{% endhint %} + +## 0. Requirements + +1. [Google Cloud SDK ](https://cloud.google.com/sdk/install)installed, authenticated, and configured to the project you will use. +2. [Kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) installed. +3. [Helm](https://helm.sh/3) \(2.16.0 or greater\) installed on your local machine with Tiller installed in your cluster. Helm 3 has not been tested yet. + +## 1. Set up GCP + +First define the environmental variables that we will use throughout this installation. Please customize these to reflect your environment. + +```bash +export FEAST_GCP_PROJECT_ID=my-gcp-project +export FEAST_GCP_REGION=us-central1 +export FEAST_GCP_ZONE=us-central1-a +export FEAST_BIGQUERY_DATASET_ID=feast +export FEAST_GCS_BUCKET=${FEAST_GCP_PROJECT_ID}_feast_bucket +export FEAST_GKE_CLUSTER_NAME=feast +export FEAST_SERVICE_ACCOUNT_NAME=feast-sa +``` + +Create a Google Cloud Storage bucket for Feast to stage batch data exports: + +```bash +gsutil mb gs://${FEAST_GCS_BUCKET} +``` + +Create the service account that Feast will run as: + +```bash +gcloud iam service-accounts create ${FEAST_SERVICE_ACCOUNT_NAME} + +gcloud projects add-iam-policy-binding ${FEAST_GCP_PROJECT_ID} \ + --member serviceAccount:${FEAST_SERVICE_ACCOUNT_NAME}@${FEAST_GCP_PROJECT_ID}.iam.gserviceaccount.com \ + --role roles/editor + +gcloud iam service-accounts keys create key.json --iam-account \ +${FEAST_SERVICE_ACCOUNT_NAME}@${FEAST_GCP_PROJECT_ID}.iam.gserviceaccount.com +``` + +## 2. Set up a Kubernetes \(GKE\) cluster + +{% hint style="warning" %} +Provisioning a GKE cluster can expose your services publicly. This guide does not cover securing access to the cluster. +{% endhint %} + +Create a GKE cluster: + +```bash +gcloud container clusters create ${FEAST_GKE_CLUSTER_NAME} \ + --machine-type n1-standard-4 +``` + +Create a secret in the GKE cluster based on your local key `key.json`: + +```bash +kubectl create secret generic feast-gcp-service-account --from-file=key.json +``` + +For this guide we will use `NodePort` for exposing Feast services. In order to do so, we must find an External IP of at least one GKE node. This should be a public IP. + +```bash +export FEAST_IP=$(kubectl describe nodes | grep ExternalIP | awk '{print $2}' | head -n 1) +export FEAST_CORE_URL=${FEAST_IP}:32090 +export FEAST_ONLINE_SERVING_URL=${FEAST_IP}:32091 +export FEAST_BATCH_SERVING_URL=${FEAST_IP}:32092 +``` + +Add firewall rules to open up ports on your Google Cloud Platform project: + +```bash +gcloud compute firewall-rules create feast-core-port --allow tcp:32090 +gcloud compute firewall-rules create feast-online-port --allow tcp:32091 +gcloud compute firewall-rules create feast-batch-port --allow tcp:32092 +gcloud compute firewall-rules create feast-redis-port --allow tcp:32101 +gcloud compute firewall-rules create feast-kafka-ports --allow tcp:31090-31095 +``` + +## 3. Set up Helm + +Run the following command to provide Tiller with authorization to install Feast: + +```bash +kubectl apply -f - < + + + Option + Description + Default + + + + + FEAST_CONFIG + + Location of Feast configuration file + /.feast/config + + + + CONFIG_FEAST_ENV_VAR_PREFIX + + +

Default prefix to Feast environmental variable options.

+

Does not apply to FEAST_CONFIG +

+ + FEAST_ + + + + PROJECT + + Default Feast project to use + default + + + + CORE_URL + + URL used to connect to Feast Core + localhost:6565 + + + + CORE_ENABLE_SSL + + Enables TLS/SSL on connections to Feast Core + False + + + + CORE_AUTH_ENABLED + + Enable user authentication when connecting to a Feast Core instance + False + + + + CORE_AUTH_TOKEN + + Provide a static JWT token to authenticate with Feast Core + Null + + + + CORE_SERVER_SSL_CERT + + Path to certificate(s) used by Feast Client to authenticate TLS connection + to Feast Core (not to authenticate you as a client). + Null + + + + SERVING_URL + + URL used to connect to Feast Serving + localhost:6566 + + + + SERVING_ENABLE_SSL + + Enables TLS/SSL on connections to Feast Serving + False + + + + SERVING_SERVER_SSL_CERT + + Path to certificate(s) used by Feast Client to authenticate TLS connection + to Feast Serving (not to authenticate you as a client). + None + + + + GRPC_CONNECTION_TIMEOUT_DEFAULT + + Default gRPC connection timeout to both Feast Serving and Feast Core (in + seconds) + 3 + + + + GRPC_CONNECTION_TIMEOUT_APPLY + + Default gRPC connection timeout when sending an ApplyFeatureSet command + to Feast Core (in seconds) + 600 + + + + BATCH_FEATURE_REQUEST_WAIT_S + + Time to wait for batch feature requests before timing out. + 600 + + + +### Usage + +#### Configuration File + +Feast Configuration File \(`~/.feast/config`\) + +```text +[general] +project = default +core_url = localhost:6565 +``` + +#### Environmental Variables + +```bash +FEAST_CORE_URL=my_feast:6565 FEAST_PROJECT=default feast projects list +``` + +#### Feast SDK + +```python +client = Client( + core_url="localhost:6565", + project="default" +) +``` + diff --git a/docs/roadmap.md b/docs/roadmap.md new file mode 100644 index 0000000000..a047cf4125 --- /dev/null +++ b/docs/roadmap.md @@ -0,0 +1,31 @@ +# Roadmap + +## Feast 0.5 \(Technical Release\) + +[Discussion](https://github.com/gojek/feast/issues/527) + +### New functionality + +1. Streaming statistics and validation \(M1 from [Feature Validation RFC](https://docs.google.com/document/d/1TPmd7r4mniL9Y-V_glZaWNo5LMXLshEAUpYsohojZ-8/edit)\) +2. Support for Redis Clusters \([\#478](https://github.com/gojek/feast/issues/478), [\#502](https://github.com/gojek/feast/issues/502)\) +3. Add feature and feature set labels, i.e. key/value registry metadata \([\#463](https://github.com/gojek/feast/issues/463)\) +4. Job management API \([\#302](https://github.com/gojek/feast/issues/302)\) + +### Technical debt, refactoring, or housekeeping + +1. Clean up and document all configuration options \([\#525](https://github.com/gojek/feast/issues/525)\) +2. Externalize storage interfaces \([\#402](https://github.com/gojek/feast/issues/402)\) +3. Reduce memory usage in Redis \([\#515](https://github.com/gojek/feast/issues/515)\) +4. Support for handling out of order ingestion \([\#273](https://github.com/gojek/feast/issues/273)\) +5. Remove feature versions and enable automatic data migration \([\#386](https://github.com/gojek/feast/issues/386)\) \([\#462](https://github.com/gojek/feast/issues/462)\) +6. Tracking of batch ingestion by with dataset\_id/job\_id \([\#461](https://github.com/gojek/feast/issues/461)\) +7. Write Beam metrics after ingestion to store \(not prior\) \([\#489](https://github.com/gojek/feast/issues/489)\) + +## Feast 0.6 \(Feature Release\) + +### New functionality + +1. User authentication & authorization \([\#504](https://github.com/gojek/feast/issues/504)\) +2. Batch statistics and validation \(M2 from [Feature Validation RFC](https://docs.google.com/document/d/1TPmd7r4mniL9Y-V_glZaWNo5LMXLshEAUpYsohojZ-8/edit)\) +3. Online feature/entity status metadata \([\#658](https://github.com/gojek/feast/pull/658)\) + diff --git a/docs/user-guide/architecture.md b/docs/user-guide/architecture.md new file mode 100644 index 0000000000..5de6fe66eb --- /dev/null +++ b/docs/user-guide/architecture.md @@ -0,0 +1,41 @@ +# Architecture + +![Feast high-level flow](../.gitbook/assets/blank-diagram-4.svg) + +### **Feast Core** + +Feast Core is the central management service of a Feast deployment. It's role is to: + +* Allows users to create [entities](entities.md) and [features](features.md) through the creation and management of [feature sets](feature-sets.md). +* Starts and manages [ingestion jobs](data-ingestion.md). These jobs populate [stores](stores.md) from [sources](sources.md) based on the feature sets that are defined and the subscription\(s\) that a [store](stores.md) has. + +{% hint style="info" %} +Job management may move out of Feast Core to Feast Serving in the future. +{% endhint %} + +### **Feast Ingestion** + +Before a user ingests data into Feast, they should register one or more feature sets. These [feature sets](feature-sets.md) tell Feast where to find their data, how to ingest it, and also describe the characteristics of the data for validation purposes. Once a feature set is registered, Feast will start an Apache Beam job in order to populate a store with data from a source. + +In order for stores to be populated with data, users must publish the data to a [source](sources.md). Currently Feast only supports Apache Kafka as a source. Feast users \(or pipelines\) ingest batch data through the [Feast SDK](../getting-started/connecting-to-feast-1/connecting-to-feast.md) using its `ingest()` method. The SDK publishes the data straight to Kafka. + +Streaming systems can also ingest data into Feast. This is done by publishing to the correct Kafka topic in the expected format. Feast expects data to be in [FeatureRow.proto](https://api.docs.feast.dev/grpc/feast.types.pb.html#FeatureRow) format. The topic and brokers can be found on the feature set schema using the [Python SDK](../getting-started/connecting-to-feast-1/python-sdk.md). + +### **Stores** + +Stores are nothing more than databases used to store feature data. Feast loads data into stores through an ingestion process, after which the data can be served through the Feast Serving API. Stores are documented in the following section. + +{% page-ref page="stores.md" %} + +### **Feast Serving** + + `Feast Serving` is the data access layer through which end users and production systems retrieve feature data. Each `Serving` instance is backed by a [store](stores.md). + +Since Feast supports multiple store types \(online, historical\) it is common to have two instances of Feast Serving deployed, one for online serving and one for historical serving. However, Feast allows for any number of `Feast Serving` deployments, meaning it is possible to deploy a `Feast Serving` deployment per production system, with its own stores and population jobs. + +`Serving` deployments can subscribe to a subset of feature data. Meaning they do not have to consume all features known to a `Feast Core` deployment. + +Feature retrieval \(and feature references\) are documented in more detail in subsequent sections + +{% page-ref page="feature-retrieval.md" %} + diff --git a/docs/user-guide/data-ingestion.md b/docs/user-guide/data-ingestion.md new file mode 100644 index 0000000000..464a51bbfd --- /dev/null +++ b/docs/user-guide/data-ingestion.md @@ -0,0 +1,126 @@ +# Data ingestion + +Users don't necessarily have to provide their external data sources in order to load data into Feast. Feast also allows users to exclude the source when registering a feature set. The following is a valid feature set specification. + +```python +feature_set = FeatureSet( + name="stream_feature", + entities=[Entity("entity_id", ValueType.INT64)], + features=[Feature("feature_value1", ValueType.STRING)], +) +``` + +If a user does not provide a source of data then they are required to publish data to Feast. This process is called ingestion. + +### Ingesting data + +The following example demonstrates how data ingestion works. For a full tutorial please see the [Telco Customer Churn Prediction Notebook](https://github.com/gojek/feast/blob/master/examples/feast-xgboost-churn-prediction-tutorial/Telecom%20Customer%20Churn%20Prediction%20%28with%20Feast%20and%20XGBoost%29.ipynb). + +1. Connect to Feast Core and load in a Pandas DataFrame. + +```python +from feast import FeatureSet, Client, Entity +import pandas as pd + +# Connect to Feast core +client = Client(core_url="feast-core.example.com") + +# Load in customer data +df = pd.read_csv("customer.csv") +``` + +1. Create, infer, and register a feature set from the DataFrame. This is a once off step that is required to initially register a feature set with Feast + +```python +# Create an empty feature set +customer_churn_fs = FeatureSet("customer_churn") + +# Infer the schema of the feature set from the Pandas DataFrame +customer_churn_fs.infer_fields_from_df( + df, + entities=[Entity(name='customer_id', + dtype=ValueType.STRING)] + ) + +# Register the feature set with Feast +client.apply(customer_churn_fs) +``` + +1. We can also test that the feature set was correctly registered with Feast by retrieving it again and printing it out + +```text +customer_churn_fs = client.get_feature_set('customer_churn') +print(client.get_feature_set('customer_churn')) +``` + +```yaml +{ + "spec": { + "name": "customer_churn", + "entities": [ + { + "name": "customer_id", + "valueType": "STRING" + } + ], + "features": [ + { + "name": "churn", + "valueType": "INT64" + }, + { + "name": "contract_month_to_month", + "valueType": "INT64" + }, + { + "name": "streamingmovies", + "valueType": "INT64" + }, + { + "name": "paperlessbilling", + "valueType": "INT64" + }, + { + "name": "contract_two_year", + "valueType": "INT64" + }, + { + "name": "partner", + "valueType": "INT64" + } + ], + "maxAge": "0s", + "source": { + "type": "KAFKA", + "kafkaSourceConfig": { + "bootstrapServers": "10.202.250.99:31190", + "topic": "feast" + } + }, + "project": "default" + }, + "meta": { + "createdTimestamp": "2020-03-15T07:47:52Z", + "status": "STATUS_READY" + } +} +``` + +Once we are happy that the schema is correct, we can start to ingest the DataFrame into Feast. + +```text +client.ingest(customer_churn_fs, telcom) +``` + +```text +100%|██████████| 7032/7032 [00:02<00:00, 2771.19rows/s] +Ingestion complete! + +Ingestion statistics: +Success: 7032/7032A rows ingested +``` + +{% hint style="warning" %} +Feast ingestion maintains the order of data that is ingested. This means that data that is written later will replace those that are written prior in stores. This is important to note when ingesting data that will end in a production system. +{% endhint %} + diff --git a/docs/user-guide/entities.md b/docs/user-guide/entities.md new file mode 100644 index 0000000000..f2383e440c --- /dev/null +++ b/docs/user-guide/entities.md @@ -0,0 +1,43 @@ +# Entities + +An entity is any domain object that can be modeled and that information can be stored about. Entities are usually recognizable concepts, either concrete or abstract, such as persons, places, things, or events which have relevance to the modeled system. + +More formally, an entity is an instance of an entity type. An entity type is the class of entities where entities are the instances. + +* Examples of entity types in the context of ride-hailing and food delivery: `customer`, `order`, `driver`, `restaurant`, `dish`, `area`. +* A specific driver, for example a driver with ID `D011234` would be an entity of the entity type `driver` + +An entity is the object on which features are observed. For example we could have a feature `total_trips_24h` on the driver `D01123` with a feature value of `11`. + +In the context of Feast, entities are important because they are used as keys when looking up feature values. Entities are also used when joining feature values between different feature sets in order to build one large data set to train a model, or to serve a model. + +Example entity properties + +{% code title="customer\_id.yaml" %} +```yaml +# Entity name +name: customer_id + +# Entity value type +value_type: INT64 + +``` +{% endcode %} + +Entities can be created through the [Feast SDK](../getting-started/connecting-to-feast-1/connecting-to-feast.md) as follows + +```python +from feast import Entity, ValueType, FeatureSet + +# Create a customer entity +customer = Entity("customer_id", ValueType.INT64) + +# Create a feature set with only a single entity +customer_feature_set = FeatureSet("customer_fs", entities=[customer]) + +# Register the feature set with Feast +client.apply(customer_feature_set) +``` + +Please see the [EntitySpec](https://api.docs.feast.dev/grpc/feast.core.pb.html#EntitySpec) for the entity specification API. + diff --git a/docs/user-guide/feature-retrieval.md b/docs/user-guide/feature-retrieval.md new file mode 100644 index 0000000000..e49db24fe9 --- /dev/null +++ b/docs/user-guide/feature-retrieval.md @@ -0,0 +1,116 @@ +# Feature retrieval + +## 1. Overview + +Feature retrieval \(or serving\) is the process of retrieving either historical features or online features from Feast, for the purposes of training or serving a model. + +Feast attempts to unify the process of retrieving features in both the historical and online case. It does this through the creation of feature references. One of the major advantages of using Feast is that you have a single semantic reference to a feature. These feature references can then be stored alongside your model and loaded into a serving layer where it can be used for online feature retrieval. + +## 2. Feature references + +In Feast, each feature can be uniquely addressed through a feature reference. A feature reference is composed of the following components + +* Feature Set +* Feature + +These components can be used to create a string based feature reference as follows + +`:` + +Feast will attempt to infer both the `feature-set` name if it is not provided, but a feature reference must provide a `feature` name. + +```python +# Feature references +features = [ + 'partner', + 'daily_transactions', + 'customer_feature_set:dependents', + 'customer_feature_set:has_phone_service', + ] + +target = 'churn' +``` + +Feature references only apply to a single `project`. Features cannot be retrieved across projects in a single request. + +## 3. Historical feature retrieval + +Historical feature retrieval can be done through either the Feast SDK or directly through the Feast Serving gRPC API. Below is an example of historical retrieval from the [Churn Prediction Notebook](https://github.com/gojek/feast/blob/master/examples/feast-xgboost-churn-prediction-tutorial/Telecom%20Customer%20Churn%20Prediction%20%28with%20Feast%20and%20XGBoost%29.ipynb). + +```python +# Add the target variable to our feature list +features = self._features + [self._target] + +# Retrieve training dataset from Feast. The "entity_df" is a dataframe that contains +# timestamps and entity keys. In this case, it is a dataframe with two columns. +# One timestamp column, and one customer id column +dataset = client.get_batch_features( + feature_refs=features, + entity_rows=entity_df + ) + +# Materialize the dataset object to a Pandas DataFrame. +# Alternatively it is possible to use a file reference if the data is too large +df = dataset.to_dataframe() +``` + +In the above example, Feast does a point in time correct query from a single feature set. For each timestamp and entity key combination that is provided by `entity_df`, Feast determines the values of all the features in the `features` list at that respective point in time and then joins features values to that specific entity value and timestamp, and repeats this process for all timestamps. + +This is called a point in time correct join. + +Feast allows users to retrieve features from any feature sets and join them together in a single response dataset. The only requirement is that the user provides the correct entities in order to look up the features. + +### **Point-in-time-correct Join** + +Below is another example of how a point-in-time-correct join works. We have two dataframes. The first is the `entity dataframe` that contains timestamps, entities, and labels. The user would like to have driver features joined onto this `entity dataframe` from the `driver dataframe` to produce an `output dataframe` that contains both labels and features. They would then like to train their model on this output + +![Input 1: Entity DataFrame](https://lh3.googleusercontent.com/ecS5sqj3FHLFSm06XF11NmTQSru-bQ4Az3Kuko_vg5YlBxXjHadlsGwmo7d7wUx4fA1ssdZvxrESDKfkGWjj3HNJg_jIqXY0avz2JzCcEOXLBLmtXNEY8k2u3f4QusHdDWdqRARQHYE) + +![Input 2: Driver DataFrame](https://lh3.googleusercontent.com/LRtCOzmcfhLWzpyndbRKZSVPanLLzfULoHx2YxY6N3i1gQd2Eh6MS1igahOe8ydA7zQulIFJEaQ0IXFXOsdkKRobOC6ThSOnT4hACbCl1jeM4O2JDVC_kvw8lwTCezVUD3d6ZUYj31Q) + +Typically the `input 1` DataFrame would be provided by the user, and the `input 2` DataFrame would already be ingested into Feast. To join these two, the user would call Feast as follows: + +```python +# Feature references +features = [ + 'conv_rate', + 'acc_rate', + 'avg_daily_trips', + 'trip_completed' + ] + + +dataset = client.get_batch_features( + feature_refs=features, # this is a list of feature references + entity_rows=entity_df # This is the entity dataframe above + ) + +# This prints out the dataframe below +print(dataset.to_dataframe()) +``` + +![Output: Joined DataFrame](https://lh5.googleusercontent.com/Gm-4Ru68KyIQ2tQtaVTDFngqO7pMtlMP1YAQO-bqln6_Mo2XAPdbij6w5ACnHAmQ053XUPu6G-c2aYRVJxPqPTMN_BcH6PY0-E1kCwXQAdW1CcQo5tc0g5ilcuVAtqsHcJB1R5mBdLo) + +Feast is able to intelligently join feature data with different timestamps to a single basis table in a point-in-time-correct way. This allows users to join daily batch data with high-frequency event data transparently. They simply need to know the feature names. + +Point-in-time-correct joins also prevents the occurrence of feature leakage by trying to accurate the state of the world at a single point in time, instead of just joining features based on the nearest timestamps. + +## Online feature retrieval + +Online feature retrieval works in much the same way as batch retrieval, with one important distinction: Online stores only maintain the current state of features. No historical data is served. + +```python +features = [ + 'conv_rate', + 'acc_rate', + 'avg_daily_trips', + ] + +data = client.get_online_features( + feature_refs=features, # Contains only feature references + entity_rows=entity_rows, # Contains only entities (driver ids) + ) +``` + +Online serving with Feast is built to be very low latency. Feast Serving provides a [gRPC API](https://api.docs.feast.dev/grpc/feast.serving.pb.html) that is backed by [Redis](https://redis.io/). We also provide support for [Python](https://api.docs.feast.dev/python/), [Go](https://godoc.org/github.com/gojek/feast/sdk/go), and Java clients. + diff --git a/docs/user-guide/feature-sets.md b/docs/user-guide/feature-sets.md new file mode 100644 index 0000000000..48ad81a9e5 --- /dev/null +++ b/docs/user-guide/feature-sets.md @@ -0,0 +1,54 @@ +# Feature Sets + +Feature sets are both a schema and a means of identifying data sources for features. + +Data typically comes in the form of flat files, dataframes, tables in a database, or events on a stream. Thus the data occurs with multiple columns/fields in multiple rows/events. + +Feature sets are a way for defining the unique properties of these data sources, how Feast should interpret them, and how Feast should source them. Feature sets allow for groups of fields in these data sources to be [ingested](data-ingestion.md) and [stored](stores.md) together. Feature sets allow for efficient storage and logical namespacing of data within [stores](stores.md). + +{% hint style="info" %} +Feature sets are a grouping of feature sets based on how they are loaded into Feast. They ensure that data is efficiently stored during ingestion. Feature sets are not a grouping of features for retrieval of features. During retrieval it is possible to retrieve feature values from any number of feature sets. +{% endhint %} + +### Customer Transactions Example + +Below is an example of a basic `customer transactions` feature set that has been exported to YAML: + +{% tabs %} +{% tab title="customer\_transactions\_feature\_set.yaml" %} +```yaml +name: customer_transactions +entities: +- name: customer_id + valueType: INT64 +features: +- name: daily_transactions + valueType: FLOAT +- name: total_transactions + valueType: FLOAT +``` +{% endtab %} +{% endtabs %} + +The dataframe below \(`customer_data.csv`\) contains the features and entities of the above feature set + +| datetime | customer\_id | daily\_transactions | total\_tra**nsactions** | +| :--- | :--- | :--- | :--- | +| 2019-01-01 01:00:00 | 20001 | 5.0 | 14.0 | +| 2019-01-01 01:00:00 | 20002 | 2.6 | 43.0 | +| 2019-01-01 01:00:00 | 20003 | 4.1 | 154.0 | +| 2019-01-01 01:00:00 | 20004 | 3.4 | 74.0 | + +In order to ingest feature data into Feast for this specific feature set: + +```python +# Load dataframe +customer_df = pd.read_csv("customer_data.csv") + +# Create feature set from YAML (using YAML is optional) +cust_trans_fs = FeatureSet.from_yaml("customer_transactions_feature_set.yaml") + +# Load feature data into Feast for this specific feature set +client.ingest(cust_trans_fs, customer_data) +``` + diff --git a/docs/user-guide/features.md b/docs/user-guide/features.md new file mode 100644 index 0000000000..224ca798a5 --- /dev/null +++ b/docs/user-guide/features.md @@ -0,0 +1,38 @@ +# Features + +A feature is an individual measurable property or characteristic of a phenomenon being observed. Features are the most important concepts within a feature store. Feature data is used both as input to models during training and when models are served in production. + +In the context of Feast, features are values that are associated with either one or more entities over time. In Feast, these values are either primitives or lists of primitives. Each feature can also have additional information attached to it. + +The following is a YAML representation of a feature specification. This specification would form part of a larger specification within a [feature set](feature-sets.md). + +{% code title="total\_trips\_feature.yaml" %} +```yaml +# Entity name +name: total_trips_24h + +# Entity value type +value_type: INT64 +``` +{% endcode %} + + Features can be created through the[ Feast SDK](../getting-started/connecting-to-feast-1/connecting-to-feast.md) as follows + +```python +from feast import Entity, Feature, ValueType, FeatureSet + +# Create a driver entity +driver = Entity("driver_id", ValueType.INT64) + +# Create a total trips 24h feature +total_trips_24h = Feature("total_trips_24h", ValueType.INT64) + +# Create a feature set with a single entity and a single feature +driver_fs = FeatureSet("customer_fs", entities=[driver], features=[total_trips_24h]) + +# Register the feature set with Feast +client.apply(driver_fs) +``` + +Please see the [FeatureSpec](https://api.docs.feast.dev/grpc/feast.core.pb.html#FeatureSpec) for the complete feature specification API. + diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md new file mode 100644 index 0000000000..0d45825368 --- /dev/null +++ b/docs/user-guide/overview.md @@ -0,0 +1,58 @@ +# Concepts + +## Using Feast + +Feast acts as the interface between ML models and data. Feast enables your team to + +1. Create feature specifications to manage features and load in data that should be managed +2. Retrieve historical features for training models +3. Retrieve online features for serving models + +{% hint style="info" %} +Feast currently does not apply feature transformations to data. +{% endhint %} + +### 1. Creating and managing features + +Feature creators model the data within their organization into Feast through the creation of [feature sets](feature-sets.md). + +Feature sets are specifications that contain both schema and data source information. They allow Feast to know how to interpret your data, and optionally where to find it. Feature sets allow users to define domain [entities](entities.md) along with the [features](features.md) that are available on these entities. Feature sets also allow users to define schemas that describe the properties of the data, which in turn can be used for validation purposes. + +Once a feature set has been registered, Feast will create the relevant schemas to store feature data within it's feature [stores](stores.md). These stores are then automatically populated by by jobs that ingest data from data [sources](sources.md), making it possible for Feast to provide access to features for training and serving. It is also possible for users to [ingest](data-ingestion.md) data into Feast instead of using an external source. + +Read more about [feature sets](feature-sets.md). + +### 2. Retrieving historical features during training + +Both online and historical retrieval are executed through an API call to `Feast Serving` using [feature references](feature-retrieval.md). In the case of historical serving it is necessary to provide Feast with the entities and timestamps that feature data will be joined to. Feast eagerly produces a point-in-time correct dataset based on the features that have been requested. These features can come from any number of feature sets. + +Stores supported: [BigQuery](https://cloud.google.com/bigquery) + +### 3. Retrieving online features during serving + +Feast also allows users to call `Feast Serving` for online feature data. Feast only stores the latest values during online serving for each feature, as opposed to historical serving where all historical values are stored. Online serving allows for very low latency requests to feature data at very high throughput. + +Stores supported: [Redis](https://redis.io/), [Redis Cluster](https://redis.io/topics/cluster-tutorial) + +## Concept Hierarchy + +![](../.gitbook/assets/image%20%283%29.png) + +Feast resources are arranged in the above hierarchy, with projects grouping one or more [feature sets](feature-sets.md), which in turn groups multiple [features](features.md) or [entities](entities.md). + +The logical grouping of these resources are important for namespacing as well as retrieval. During retrieval time it is necessary to reference individual features through feature references. These references uniquely identify a feature or entity within a Feast deployment. + +## Concepts + +[Entities](entities.md) are objects in an organization that model a specific construct. Examples of these include customers, transactions, and drivers. + +[Features](features.md) are measurable properties that are observed on entities. Features are used as inputs to models. + +[Feature Sets](feature-sets.md) are schemas that define logical groupings of entities, features, data sources, and other related metadata. + +[Stores](stores.md) are databases that maintain feature data that gets served to models during training or inference. + +[Sources](sources.md) are either internal or external data sources where feature data can be found. + +[Ingestion](data-ingestion.md) is the process of loading data into Feast. + diff --git a/docs/user-guide/sources.md b/docs/user-guide/sources.md new file mode 100644 index 0000000000..35fcb41a11 --- /dev/null +++ b/docs/user-guide/sources.md @@ -0,0 +1,32 @@ +# Sources + +A `source` is a data source that can be used to find feature data. Users define sources as part of [feature sets](feature-sets.md). Once a feature set is registered with a source, Feast will automatically start to populate its stores with data from this source. + +{% hint style="info" %} +Feast only supports [Kafka](https://kafka.apache.org/) as a source currently. +{% endhint %} + +An example of a user provided source can be seen in the following code snippet + +```python +feature_set = FeatureSet( + name="stream_feature", + entities=[ + Entity("entity_id", ValueType.INT64) + ], + features=[ + Feature("feature_value1", ValueType.STRING) + ], + source=KafkaSource( + brokers="mybroker:9092", + topic="my_feature_topic" + ) +) +``` + +Once this feature set is registered, Feast will start an ingestion job that retrieves data from this source and starts to populate all [stores](stores.md) that subscribe to it. + +In most cases a feature set \(and by extension its source\) will be used to populate both an online store and a historical store. This allows users to both train and serve their model with the same feature data. + +Feast will ensure that the source complies with the schema of the feature set. The event data has to be [Protobuf](https://developers.google.com/protocol-buffers) encoded and must contain the necessary [FeatureRow](https://api.docs.feast.dev/grpc/feast.types.pb.html#FeatureRow) structure. + diff --git a/docs/user-guide/stores.md b/docs/user-guide/stores.md new file mode 100644 index 0000000000..c15cbb3db0 --- /dev/null +++ b/docs/user-guide/stores.md @@ -0,0 +1,37 @@ +# Stores + +In Feast, a store describes a database that is populated with feature data in order to be served to models. + +Feast supports two classes of stores + +* Historical stores +* Online stores + +In order to populate these stores, Feast Core creates a long running ingestion job that streams in data from all feature sources to all stores that subscribe to those feature sets. + +![](../.gitbook/assets/image%20%282%29.png) + +## Historical Stores + +Historical stores maintain a complete history of feature data for the feature sets they are subscribed to. + +Feast currently only supports [Google BigQuery](https://cloud.google.com/bigquery) as a feature store, but we have [developed a storage API ](https://github.com/gojek/feast/issues/482)that makes adding a new store possible. + +Each historical store models its data differently, but in the case of a relational store \(like BigQuery\), each feature set maps directly to a table. Each feature and entity within a feature set maps directly to a column within a table. + +Data from historical stores can be used to train a model. In order to retrieve data from a historical store it is necessary to connect to a Feast Serving deployment and request historical features. Please see feature retrieval for more details. + +{% hint style="danger" %} +Data is persisted in historical stores like BigQuery in log format. Repeated ingestions will duplicate the data is persisted in the store. Feast will automatically deduplicate data during retrieval, but it doesn't currently remove data from the stores themselves. +{% endhint %} + +## Online Stores + +Online stores maintain only the latest values for a specific feature. Feast currently supports Redis as an online store. Online stores are meant for very high throughput writes from ingestion jobs and very low latency access to features during online serving. + +Please continue to the [feature retrieval](feature-retrieval.md) section for more details on retrieving data from online storage. + +## Subscriptions + +Stores are populated by ingestion jobs \(Apache Beam\) that retrieve feature data from sources based on subscriptions. These subscriptions are typically defined by the administrators of the Feast deployment. In most cases a store would simply subscribe to all features, but in some cases it may subscribe to a subset in order to improve performance or efficiency. + diff --git a/docs/why-feast.md b/docs/why-feast.md new file mode 100644 index 0000000000..75048ace2c --- /dev/null +++ b/docs/why-feast.md @@ -0,0 +1,32 @@ +# Why Feast? + +## Lack of feature reuse + +**Problem:** The process of engineering features is one of the most time consuming activities in building an end-to-end ML system. Despite this, many teams continue to redevelop the same features from scratch for every new project. Often these features never leaving the notebooks or pipelines they are built in. + +**Solution:** A centralized feature store allows organizations to build up a foundation of features that can be reused across projects. Teams are then able to utilize features developed by other teams, and as more features are added to the store it becomes easier and cheaper to build models. + +## Serving features is hard + +**Problem:** Serving up to date features at scale is hard. Raw data can come from a variety of sources, from data lakes, to even streams, to data warehouse, to simply flat files. Data scientists need the ability to produce massive datasets of features from this data in order to train their models offline. These models then need access to real-time feature data at low latency and high throughput when they are served in production. + +**Solution:** Feast is built to be able to ingest data from a variety of sources, supporting both streaming and batch sources. Once data is loaded into Feast as features, they become available through both a batch serving API as well as an real-time \(online serving\) API. These APIs allows data scientists and ML engineers to easily retrieve feature data for their development, training, or in production. Feast also comes with a Java, Go, and Python SDK to make this experience easy. + +## **Models need point-in-time correctness** + +**Problem:** Most data sources are not built with ML use cases in mind and by extension don't provide point-in-time correct lookups of feature data. One of the reasons why features are often re-engineered is because ML practitioners need to ensure that their models are trained on a dataset that accurately models the state of the world when the model runs in production. + +**Solution:** Feast allows end users to create point-in-time correct datasets across multiple entities. Feast ensures that there is no data leakage, that cross feature set joins are valid, and that models are not fed expired data. + +## Definitions of features vary + +**Problem:** Teams define features differently and there is no easy access to the documentation of a feature. + +**Solution:** Feast becomes the single source of truth for all feature data for all models within an organization. Teams are able to capture documentation, metadata and metrics about features. This allows teams to communicate clearly about features, test features data, and determine if a feature is useful for a particular model. + +## **Inconsistency between training and serving** + +**Problem:** Training requires access to historical data, whereas models that serve predictions need the latest values. Inconsistencies arise when data is siloed into many independent systems requiring separate tooling. Often teams are using Python for creating batch features off line, but these features are redeveloped with different libraries and languages when moving to serving or streaming systems. + +**Solution:** Feast provides consistency by managing and unifying the ingestion of data from batch and streaming sources into both the feature warehouse and feature serving stores. Feast becomes the bridge between your model and your data, both for training and serving. This ensures that there is a consistency in the feature data that your model receives. +