Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include adapter response info in execution results #2961

Merged
merged 4 commits into from
Dec 22, 2020

Conversation

kwigley
Copy link
Contributor

@kwigley kwigley commented Dec 16, 2020

resolves #2747

Description

This PR adds an unstructured dictionary to results to track info parsed from query execution.

  • rename get_status -> get_response in SQLConnectionManager this is preference honestly, this may cause breaking changes in other adapters
  • execute now returns a tuple of adapter response data (this was previously a status string) and the agate table
  • updated the execution result we store in the context
  • write this new unstructured dictionary to a field named adapter_response in run_results.json and sources.json

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Dec 16, 2020
@kwigley kwigley changed the title Add adapter specific query stats [WIP] Add adapter specific query stats Dec 16, 2020
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great!

A few thoughts about naming:

  • we already use the word status in the result object, to indicate dbt's interpretation of the response, so we probably shouldn't repeat it here as well
  • the dict is only populated if the query succeeds
  • today, we populate the dict only for materializations (models, seeds), i.e. DDL/DML statements, though someday we may want to report bytes processed by BigQuery tests (which are just select statements)

With that in mind, I'm thinking about this dict as a way of reporting the database's "API response" to dbt's successful "request" (materialization DDL/DML). An adapter_response is comprised of a code, optional attributes, and then a message that (based on the adapter's logic) is either recorded verbatim or synthesized from those attributes.

"adapter_response": {
    "message": "INSERT 0 2",
    "code": "INSERT",
    "rows_affected": 2
}
"adapter_response": {
    "message": "MERGE (2.0 rows, 16.0 Bytes processed)",
    "code": "MERGE",
    "rows_affected": 2,
    "bytes_processed": 16
}
"adapter_response": {
    "message": "CREATE VIEW",
    "code": "CREATE VIEW"
}
"adapter_response": {
    "message": "OK",
    "code": "OK"
}
"adapter_response": {
    "message": "SUCCESS 1",
    "code": "SUCCESS",
    "rows_affected": 1
}

What do you think?

plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
return cursor.statusmessage
def get_status(cls, cursor) -> ExecutionStatus:
message = str(cursor.statusmessage)
rows = cursor.rowcount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a weird number: it looks like CREATE VIEW returns a value of -1? Not sure if we should intervene here, given that we're pulling it right off the connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should intervene, well actually I don't feel comfortable intervening since I don't know all the possibilities that could be returned by cursor API.

plugins/postgres/dbt/adapters/postgres/connections.py Outdated Show resolved Hide resolved
Comment on lines 202 to 203
message=str(result.status),
adapter_query_status=adapter_query_status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively duplicates the message in run_results.json, nested within the result dict and also within the adapter_query_status dict. Are we doing this to avoid a breaking change for plugins / users that expect to print message as a key to RunResult?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is correct. The message field on the run result could also contain error message so I think to be consistent it makes sense to keep this field. We could, however, remove the message field nested in the adapter response when we write out run results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, however, remove the message field nested in the adapter response when we write out run results.

ok cool, I think this is the way to go, since message is more of a "dbt thing" and may or may not map to a specific adapter response value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! I'll get this in and mark up for dev review

return "{} {}".format(state, cursor.rowcount)
return ExecutionStatus(
message="{} {}".format(state, cursor.rowcount),
rows=cursor.rowcount,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snowflake always reports back SUCCESS 1 for create view + create table statements, even if the resulting view or table has many more rows. I don't think it's really our place to intervene here, though, since we're pulling this value right off the connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above..

@kwigley
Copy link
Contributor Author

kwigley commented Dec 17, 2020

@jtcohen6 woo! thanks so much for taking a look at this! I hadn't quite fleshed out what is involved here, but your comments definitely help guide what is left to do! 100% agree with your naming suggestions

@kwigley kwigley force-pushed the feature/add-adapter-query-stats branch from 0370f26 to 122bd07 Compare December 21, 2020 18:24
@kwigley kwigley force-pushed the feature/add-adapter-query-stats branch from 122bd07 to aa3bdfe Compare December 21, 2020 18:35
@kwigley kwigley changed the title [WIP] Add adapter specific query stats Include adapter response info in execution results Dec 22, 2020
@kwigley kwigley marked this pull request as ready for review December 22, 2020 18:05
elapsed=(time.time() - pre)
)

return connection, cursor

@abc.abstractclassmethod
def get_status(cls, cursor: Any) -> str:
def get_response(cls, cursor: Any) -> Union[AdapterResponse, str]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that other adapter plugins may need to account for, this might cause a breaking change.

Example: this is override in dbt-spark, will need to update this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call this out in the release notes. I'd like to reorganize + clean up the changelog before final release, anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it! I'll add a breaking section next time I break something :) please should tap me if you want a hand cleaning the changelog up

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! run results just get better and better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: bytes processed should be logged for all dbt commands that perform queries
3 participants