Introduce a filter hook mechanism for engine.schema_cache #131
Replies: 1 comment 6 replies
-
20 minutes is still quite a lot. SnowDDL should be generally faster than other tools due to built-in parallelism. Could you run SnowDDL with Did you try to increase Can you potentially use separate owner role to stop SnowDDL from reading objects from other databases? SnowDDL should skip all databases which are not owned by current role. It also includes everything inside databases: schemas and schema objects. Code: https://github.com/littleK0i/SnowDDL/blob/master/snowddl/cache/schema_cache.py#L30-L31 With The main problem with filtering is related to objects being tightly connected with each other. For example, schema roles are required for grants. But in order to create schema roles we need schemas. If some schemas are skipped, roles and grants should be skipped as well. It would be relatively hard to implement filtering feature reliably outside of most basic use cases. Also, people will start to rely on idea of "applying config partially". Which naturally leads to the situation when parts of config are incomplete or broken, and it is not obvious. Let's see if we can figure out something else. Btw, I might be available for consulting starting from ~21 Oct. Converting everything automatically and cross-validating with a separate test Snowflake account might be easier than doing slow & painful step-by-step migration for months. |
Beta Was this translation helpful? Give feedback.
-
As the discussion in messy environment management , we are facing a difficult situation to bring SnowDDL into current production sites.
In order to skip the unnecessary databases and focus on dozens of databases in the beginning phase, we write customized configs and dump a list of the unnecessary databases. The
__custom/*.py
can make sure we only match the structures of the small number databases.But In current design of SnowDDL, even we restrict the objects types to "database,schema,table", it will still walk through all the tables include which we try to skip in
__custom/*.py
. That is loaded in theengine.schema_cache
. It would be easy to add some if statements into the for loop to filter result ofshow database like ...
. But it is invasive and hard to maintain.And also there is a discussion in flexible object filtering, an "expression" is introduced to include/exclude specific databases. It would help, but still has limitations to more complicated cases, like pattern match for "database/schema/table" name.
So how about we introduce a filter hook mechanism in the load phase of
schema_cache
, to make it possible to programmatically customize the process of loading objects from Snowflake?Beta Was this translation helpful? Give feedback.
All reactions