Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide ability to set more options when outputting joined data to bq table. #1654

Closed
mavysavydav opened this issue Jun 18, 2021 · 4 comments · Fixed by #1661
Closed

Provide ability to set more options when outputting joined data to bq table. #1654

mavysavydav opened this issue Jun 18, 2021 · 4 comments · Fixed by #1661
Assignees

Comments

@mavysavydav
Copy link
Collaborator

mavysavydav commented Jun 18, 2021

Is your feature request related to a problem? Please describe.

We need to be able to set expiries for the tables, project.dataset.tableName, and also option for create vs replace.

Describe the solution you'd like

Ideally the user can optionally pass in a config object like QueryJobConfig. QueryJobConfig has many options such as destination which allows one to set the destination of table and name, but it doesn't seem like it can handle create/replace (it has create_disposition which offers some limited control) and expiry.

Describe alternatives you've considered

N/A. Willem and a few of us briefly discussed this and he played around with the idea of passing in and out sql queries. So one could modify the query and pass it back in.

@woop
Copy link
Member

woop commented Jun 18, 2021

@mavysavydav the problem statement makes sense. get_historical_features() is a general (non-implementation specific) method, but the retrieval jobs are specific to the implementation. So roughly there are options like

job = get_historical_features(..., kwargs)

where those options get passed all the way to the offline store and the offline store then knows how to use them

job = get_historical_features(...)
job.set_config(QueryJobConfig)
job.to_bigquery()
job = get_historical_features(...)
job.to_bigquery(..., config=QueryJobConfig)
job = get_historical_features(...)
sql = job.to_sql()
job_config = bigquery.QueryJobConfig(default_dataset="bigquery-public-data.stackoverflow")
client.query(sql, job_config=job_config)

My preference is (3) or (4) right now, but don't feel super strongly.

@codyjlin
Copy link
Contributor

🙋 working on this now! Please assign me.

@mavysavydav
Copy link
Collaborator Author

yea agreed 3 or 4 is good. cody will look into

@mavysavydav
Copy link
Collaborator Author

mavysavydav commented Jun 22, 2021

solution 4 would work well with in-house wrappers. The wrapper could still present a clean api for users even tho it's changing the sql to add things like expiry. Though expiry seems like something feast would want to support natively. Tho maybe we'll wait for more needs to emerge b4 natively supporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants