-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alias database to allow matching between rows and Manifest #85
Conversation
Currently the docs generation is broken because we need to supply the database name when fetching the relations: if dct['table_database'] is None: dct['table_database'] = dct['table_schema'] However, when we get the manifest we don't get the database: {CatalogKey(database='', schema='fokko', name='logistical_configuration_data'): ['model.dbtlake.logistical_configuration_data']} Therefore the keys never line up, and we can't match the Catalogs: https://github.com/fishtown-analytics/dbt/blob/9d0eab630511723cd0bc328f6f11d3ffe6c8f879/core/dbt/task/generate.py#L108 We get from the describe relations: CatalogKey(database='fokko', schema='fokko', name='logistical_configuration_data') Due to the logic above.
6a65a33
to
cabc03b
Compare
cc @jtcohen6 |
This is a good find, and the approach looks valid. That said, would it perhaps make sense to change the I would feel very comfortable with that fix, whereas I feel a bit concerned about the knock-on effects downstream of setting an actual value as an alias. I don't think either this PR or my suggestion will actually conflict with #83 (though I haven't tested). I'm pretty confident #83 totally misses this issue. As an aside: I feel like a broken record here, but we really need a better test story for plugins. This kind of issue just shouldn't happen, and our test suite isn't even at a point where we can reasonably try to add a test for this. I guess we could modify the db-integration-tests branch we use for spark to support reading from a json file and validating some structural things, but that's a lot to ask on a PR. |
Thanks for the insights. I'll give it a try to set it in the I don't think the database can ever be set since it excluded in the accepted connection keys: https://github.com/fishtown-analytics/dbt-spark/blob/master/dbt/adapters/spark/connections.py#L53 |
That I think in 0.17.0 there will be more things that could have problems with it, because we use I'd prefer to add a special flag to core for disabling fields in adapters or even support this specific adapter behavior where |
@beckjake I have a draft PR open (#91) that attempts to follow your recommendations above. I'm still running into issues with catalog generation. @Fokko I opened a separate issue (#90) re: owner / table stats not showing up. I think this has been broken for a while, and we should absolutely fix it. I also opened an issue re: the less-than-ideal |
Currently the docs generation is broken because we need to supply
the database name when fetching the relations:
However, when we get the manifest we don't get the database:
Therefore the keys never line up, and we can't match the Catalogs:
https://github.com/fishtown-analytics/dbt/blob/9d0eab630511723cd0bc328f6f11d3ffe6c8f879/core/dbt/task/generate.py#L108
We get from the describe relations:
Due to the logic above. I think
ALIASing
this is the easiest way out. Making the database non-optional in core would be another option, that would be cleaner in the long run. Please advise.