Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add hbase online store support in feast #2590

Merged
merged 9 commits into from
Apr 25, 2022

Conversation

aurobindoc
Copy link
Contributor

What this PR does / why we need it:
This will add support for hbase online store in feast

Which issue(s) this PR fixes:
Fixes #2206

@aurobindoc aurobindoc changed the title Add hbase online store support in feast feat: Add hbase online store support in feast Apr 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2022

Codecov Report

Merging #2590 (5774d6b) into master (c439611) will decrease coverage by 22.58%.
The diff coverage is 9.72%.

@@             Coverage Diff             @@
##           master    #2590       +/-   ##
===========================================
- Coverage   82.30%   59.72%   -22.59%     
===========================================
  Files         155      157        +2     
  Lines       12747    12849      +102     
===========================================
- Hits        10492     7674     -2818     
- Misses       2255     5175     +2920     
Flag Coverage Δ
integrationtests ?
unittests 59.72% <9.72%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/cli.py 38.20% <ø> (ø)
sdk/python/feast/repo_config.py 79.58% <ø> (-9.95%) ⬇️
sdk/python/feast/infra/utils/hbase_utils.py 2.94% <2.94%> (ø)
.../online_stores/contrib/hbase_online_store/hbase.py 5.10% <5.10%> (ø)
...tion/feature_repos/universal/online_store/hbase.py 50.00% <50.00%> (ø)
.../online_stores/contrib/hbase_repo_configuration.py 100.00% <100.00%> (ø)
.../integration/online_store/test_online_retrieval.py 16.84% <0.00%> (-83.16%) ⬇️
sdk/python/tests/utils/online_read_write_test.py 19.35% <0.00%> (-80.65%) ⬇️
.../integration/online_store/test_universal_online.py 16.83% <0.00%> (-78.01%) ⬇️
...fline_store/test_universal_historical_retrieval.py 23.88% <0.00%> (-76.12%) ⬇️
... and 80 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c439611...5774d6b. Read the comment docs.

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! This looks great. Mostly nits

@@ -0,0 +1,80 @@
# Hbase Online Store
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit generally, should be HBase instead of Hbase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to HBase

@@ -0,0 +1,80 @@
# Hbase Online Store
Hbase is not included in current [Feast](https://github.com/feast-dev/feast) roadmap, this project intends to add Hbase
support for Online Store.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: want to move this to the previous line? seems to cut off unnaturally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,232 @@
# Created by aurobindo.m on 18/04/22
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to remove these comments from the files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


def batch(self, table_name: str):
"""
Returns a 'Batch' instance that can be used for mass data manipulation in the habse table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in habse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to hbase


for entity_key in entity_keys:
row_key = serialize_entity_key(entity_key).hex()
val = hbase.row(table_name, row_key=row_key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you aren't using hbase.rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miss from my side. Changed rto rows and then iterated over the rows to fetch the results. This would reduce too many calls to hbase. Good Suggestion!

online_store:
type: hbase
host: 127.0.0.1
port: 9090
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line at end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

HbaseOnlineStoreCreator,
)

FULL_REPO_CONFIGS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind rebasing against master? there's now a postgres folder in here too. maybe make this hbase_repo_configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@adchia adchia self-assigned this Apr 21, 2022
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
Deploying infrastructure for driver_hourly_stats_view
```

### Migrate Latest Data to Online Feature Store (HBase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, materialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines +115 to +125
if isinstance(timestamp, datetime):
values_dict[HbaseConstants.DEFAULT_EVENT_TS] = struct.pack(
">L", int(calendar.timegm(timestamp.timetuple()))
)
else:
values_dict[HbaseConstants.DEFAULT_EVENT_TS] = timestamp
if created_ts is not None:
if isinstance(created_ts, datetime):
values_dict[HbaseConstants.DEFAULT_CREATED_TS] = struct.pack(
">L", int(calendar.timegm(created_ts.timetuple()))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this field have some specific significance/format for HBase? And calendar is the canonical way to generate this value?

Copy link
Contributor Author

@aurobindoc aurobindoc Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this field have some specific significance/format for HBase?

In Hbase, every data is stored as byte array. To store datetime columns in HBase, I choose to convert the timestamp object to epoch integer and then convert the integer to byte array.

calendar is the canonical way to generate this value?

There are a couple of ways to convert timestamp to epoch.

  • calendar.timegm()
  • time.mktime()

time.mktime() assumes that the passed tuple is in local time, calendar.timegm() assumes it's in GMT/UTC. Depending on the interpretation the tuple represents a different time, so the functions return different values (seconds since the epoch are UTC based)

"""Given the feature name, add the column family to get the column name."""
if isinstance(feature, bytes):
feature = feature.decode("utf-8")
return HbaseConstants.DEFAULT_COLUMN_FAMILY + ":" + feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a column family represent here? Is there a reason to use the default one for all features? Would that result in squashing, if multiple feature views had the same feature name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HBase, every column should belong to a column family. In our use case, we use featre names as column, thus we add a default column family for all the features.

If multiple feature views had the same feature name?

Please note that every feature view creates a different table in HBase. We use <project_name>_<feature_view> as the table name in HBase. Thus, there is no chance of collision even if multiple feature views had the same feature name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that addresses my concern.

)


class HbaseOnlineStoreCreator(OnlineStoreCreator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a Hbase to the list of services that have containerized tests in CONTRIBUTING.md?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Signed-off-by: aurobindoc <aurobindoc1991@gmail.com>
@achals
Copy link
Member

achals commented Apr 25, 2022

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, aurobindoc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit c9eda79 into feast-dev:master Apr 25, 2022
achals pushed a commit that referenced this pull request May 13, 2022
# [0.21.0](v0.20.0...v0.21.0) (2022-05-13)

### Bug Fixes

* Addresses ZeroDivisionError when materializing file source with same timestamps ([#2551](#2551)) ([1e398d9](1e398d9))
* Asynchronously refresh registry for the feast ui command ([#2672](#2672)) ([1b09ca2](1b09ca2))
* Build platform specific python packages with ci-build-wheel ([#2555](#2555)) ([b10a4cf](b10a4cf))
* Delete data sources from registry when using the diffing logic ([#2669](#2669)) ([fc00ca8](fc00ca8))
* Enforce kw args featureservice ([#2575](#2575)) ([160d7b7](160d7b7))
* Enforce kw args in datasources ([#2567](#2567)) ([0b7ec53](0b7ec53))
* Feature logging to Redshift is broken ([#2655](#2655)) ([479cd51](479cd51))
* Feature service to templates ([#2649](#2649)) ([1e02066](1e02066))
* Feature with timestamp type is incorrectly interpreted by Go FS ([#2588](#2588)) ([e3d9588](e3d9588))
* Fix `__hash__` methods ([#2556](#2556)) ([ebb7dfe](ebb7dfe))
* Fix AWS bootstrap template ([#2604](#2604)) ([c94a69c](c94a69c))
* Fix broken proto conversion methods for data sources ([#2603](#2603)) ([00ed65a](00ed65a))
* Fix case where on demand feature view tab is broken if no custom tabs are passed.  ([#2682](#2682)) ([01d3568](01d3568))
* Fix DynamoDB fetches when there are entities that are not found ([#2573](#2573)) ([7076fe0](7076fe0))
* Fix Feast UI parser to work with new APIs ([#2668](#2668)) ([8d76751](8d76751))
* Fix java server after odfv update ([#2602](#2602)) ([0ca6297](0ca6297))
* Fix materialization with ttl=0 bug ([#2666](#2666)) ([ab78702](ab78702))
* Fix push sources and add docs / tests pushing via the python feature server ([#2561](#2561)) ([e8e418e](e8e418e))
* Fixed data mapping errors for Snowflake ([#2558](#2558)) ([53c2ce2](53c2ce2))
* Forcing ODFV udfs to be __main__ module and fixing false positive duplicate data source warning ([#2677](#2677)) ([2ce33cd](2ce33cd))
* Include the ui/build directory, and remove package data ([#2681](#2681)) ([0384f5f](0384f5f))
* Infer features for feature services when they depend on feature views without schemas ([#2653](#2653)) ([87c194c](87c194c))
* Pin dependencies to nearest major version ([#2647](#2647)) ([bb72b7c](bb72b7c))
* Pin pip<22.1 to get around breaking change in pip==22.1 ([#2678](#2678)) ([d3e01bc](d3e01bc))
* Punt deprecation warnings and clean up some warnings. ([#2670](#2670)) ([f775d2e](f775d2e))
* Reject undefined features when using `get_historical_features` or `get_online_features` ([#2665](#2665)) ([36849fb](36849fb))
* Remove ci extra from the feature transformation server dockerfile ([#2618](#2618)) ([25613b4](25613b4))
* Remove incorrect call to logging.basicConfig ([#2676](#2676)) ([8cbf51c](8cbf51c))
* Small typo in CLI ([#2578](#2578)) ([f372981](f372981))
* Switch from `join_key` to `join_keys` in tests and docs ([#2580](#2580)) ([d66c931](d66c931))
* Teardown trino container correctly after tests ([#2562](#2562)) ([72f1558](72f1558))
* Update build_go_protos to use a consistent python path ([#2550](#2550)) ([f136f8c](f136f8c))
* Update data source timestamp inference error message to make sense ([#2636](#2636)) ([3eaf6b7](3eaf6b7))
* Update field api to add tag parameter corresponding to labels in Feature. ([#2610](#2610)) ([689d20b](689d20b))
* Update java integration tests and add more logging ([#2637](#2637)) ([10e23b4](10e23b4))
* Update on demand feature view api ([#2587](#2587)) ([38cd7f9](38cd7f9))
* Update RedisCluster to use redis-py official implementation ([#2554](#2554)) ([ce5606f](ce5606f))
* Use cwd when getting module path ([#2577](#2577)) ([b550e59](b550e59))
* Use ParquetDataset for Schema Inference ([#2686](#2686)) ([4f85e3e](4f85e3e))
* Use timestamp type when converting unixtimestamp feature type to arrow ([#2593](#2593)) ([c439611](c439611))

### Features

* Add hbase online store support in feast ([#2590](#2590)) ([c9eda79](c9eda79))
* Adding SSL options for Postgres ([#2644](#2644)) ([0e809c2](0e809c2))
* Allow Feast UI to be spun up with CLI command: feast ui ([#2667](#2667)) ([44ca9f5](44ca9f5))
* Allow to pass secrets and environment variables to transformation service ([#2632](#2632)) ([ffa33ad](ffa33ad))
* CLI command 'feast serve' should start go-based server if flag is enabled ([#2617](#2617)) ([f3ff812](f3ff812))
* Create stream and batch feature view abstractions ([#2559](#2559)) ([d1f76e5](d1f76e5))
* Postgres supported as Registry, Online store, and Offline store ([#2401](#2401)) ([ed2f979](ed2f979))
* Support entity fields in feature view `schema` parameter by dropping them ([#2568](#2568)) ([c8fcc35](c8fcc35))
* Write logged features to an offline store (Python API) ([#2574](#2574)) ([134dc5f](134dc5f))
* Write logged features to Offline Store (Go - Python integration) ([#2621](#2621)) ([ccad832](ccad832))

### Reverts

* Revert "chore: Deprecate value type (#2611)" (#2643) ([4fbdfb1](4fbdfb1)), closes [#2611](#2611) [#2643](#2643)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support HBase
5 participants