Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve the case of schemas and databases when listing relations (#2403) #2411

Merged
merged 2 commits into from
May 8, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented May 6, 2020

resolves #2403

Description

There's hopefully nothing too wild or controversial in here - this is a substantial change, but it should fix some correctness issues in how dbt lists schemas. I think in the long run this will be more pleasant for adapter authors than having to figure out how to render the database/schema in each of those, and just defer it to the relations.

Change list_relations_without_caching (both the macro and the method) to take a single argument, a Relation with no identifier and identifier set to not include. Macros can use this to get a database-specific representation of the relation. The logic around creating missing schemas was updated similarly - dbt will faithfully issue "create schema" statements for whatever you give it, rather than coercing everything to lowercase and wrapping it in the given quoting.

In the process and for the same reasons, I updated drop_schema and create_schema to take the same kind of argument.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label May 6, 2020
@beckjake beckjake changed the title [WIP] Always preserve the case of values in the SchemaSearchMap (#2403) Preserve the case of schemas and databases when listing relations (#2403) May 7, 2020
@beckjake beckjake marked this pull request as ready for review May 7, 2020 19:42
The argument is a Relation object with no identifier field, configured with the appropriate quoting information

Unique quoted/unquoted representations will be treated as distinct
The logic for generating what schemas to search for relations is now distinct from the catalog search logic.
Schema creation/dropping takes a similar relation argument
Add tests
@beckjake beckjake requested a review from drewbanin May 7, 2020 20:50
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment here around community plugins (ie. Spark), but otherwise this LGTM. Let me know what you think about that comment and if there's anything we should do differently here. If not, this is approved on my end

self.cache.update_schemas(schema_map.schemas_searched())
cache_update: Set[Tuple[str, Optional[str]]] = set()
for relation in cache_schemas:
if relation.database is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to mention here that some plugins (namely Spark) use database as an alias for schema, and only have the one level of hierarchy. I think it's ok to add this logic here -- we can override it in a plugin if we need to -- but the idea that all relations will have a database and schema is not quite so invariant from dbt Core's perspective as this code indicates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark will already need to be updated, as it overrides _get_cache_schemas and list_relations_without_caching.

I think this change will actually make it easier for spark to handle it: it sets database = schema but the include policy for relations is set to database=False - there's probably a couple rough edges to clear up, but my goal is that spark's default include policy + behavior will play pretty nicely with this and we can delete the create/drop schema implementations, at least.

core/dbt/task/runnable.py Outdated Show resolved Hide resolved
Co-authored-by: Drew Banin <drew@fishtownanalytics.com>
@beckjake beckjake merged commit c9eec4f into dev/octavius-catto May 8, 2020
@beckjake beckjake deleted the fix/always-preserve-case branch May 8, 2020 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt seed not working in 0.17.0-b1
2 participants