Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organise modules by domain #197

Merged
merged 12 commits into from
Sep 14, 2023
Merged

Organise modules by domain #197

merged 12 commits into from
Sep 14, 2023

Conversation

nfx
Copy link
Collaborator

@nfx nfx commented Sep 14, 2023

This simplifies code and moves modules and classes by their respective domains/components, roughly mapped on folders. This is a change from the previous diamond-shaped cross-module dependency layout, which is harder to evolve.

Main tenets be:

  • any component can depend on mixins (staging are for SDK) and framework (common modules).
  • tight coupling is allowed within a single component. An example is local group migration, which exposes a single high-level migration toolkit (from databricks.labs.ucx.workspace_access import GroupMigrationToolkit) and leaves implementation details private. Another example is a toolkit to migrate from HMS to UC DDL.
  • all execution is either triggered from install (which also handles update) or runtime (as the entrypoint for jobs). Multiple toolkits may be combined into a single Databricks Workflows (aka Jobs) - e.g. pulling permissions and tables inventory is necessary for the assessment step, but later the Workflows may branch out.
  • if two classes are used together in 90% of cases, they have to be defined in the single file (aka Python Module). We don't define types and managers modules/packages, as this is an example of a diamond-shaped dependencies, which are more difficult to evolve over time.
  • methods and fields have to be private by default. make them public only by necessity.
  • classes must have the following order - init, overridden methods, public methods, other methods. the most important logic has to be first.

This PR doesn't change any code, just moves modules and classes by their respective domains/components, roughly mapped on folders. This is a change from the previous diamond-shaped cross-module dependency layout, which is harder to evolve.

Main tenets be:

- any component can depend on `mixins` (staging are for SDK) and `framework` (common modules).
- tight coupling is allowed within a single component. An example is local group migration, which exposes a single high-level migration toolkit (`from databricks.labs.ucx.workspace_access import GroupMigrationToolkit`) and leaves implementation details private. Another example is a toolkit to migrate from HMS to UC DDL.
- all execution is either triggered from `install` (which also handles update) or `runtime` (as the entrypoint for jobs). Multiple toolkits may be combined into a single Databricks Workflows (aka Jobs) - e.g. pulling permissions and tables inventory is necessary for the assessment step, but later the Workflows may branch out.
- if two classes are used together in 90% of cases, they have to be defined in the single file (aka Python Module). We don't define `types` and `managers` modules/packages, as this is an example of a diamond-shaped dependencies, which are more difficult to evolve over time.
- methods and fields have to be private by default. make them public only by necessity.
- classes must have the following order - __init__, overridden methods, public methods, other methods. the most important logic has to be first.
@nfx nfx added the pr/do-not-merge this pull request is not ready to merge label Sep 14, 2023
@nfx nfx changed the title Organise modules by domain [WIP] Organise modules by domain Sep 14, 2023
@nfx nfx removed the pr/do-not-merge this pull request is not ready to merge label Sep 14, 2023
@nfx nfx changed the title [WIP] Organise modules by domain Organise modules by domain Sep 14, 2023
@nfx
Copy link
Collaborator Author

nfx commented Sep 14, 2023

tests passed

@nfx nfx merged commit c832375 into main Sep 14, 2023
1 check passed
@nfx nfx deleted the refactor/permissions branch September 14, 2023 13:46
william-conti pushed a commit that referenced this pull request Sep 15, 2023
This simplifies code and moves modules and classes by their respective
domains/components, roughly mapped on folders. This is a change from the
previous diamond-shaped cross-module dependency layout, which is harder
to evolve.

Main tenets be:

- any component can depend on `mixins` (staging are for SDK) and
`framework` (common modules).
- tight coupling is allowed within a single component. An example is
local group migration, which exposes a single high-level migration
toolkit (`from databricks.labs.ucx.workspace_access import
GroupMigrationToolkit`) and leaves implementation details private.
Another example is a toolkit to migrate from HMS to UC DDL.
- all execution is either triggered from `install` (which also handles
update) or `runtime` (as the entrypoint for jobs). Multiple toolkits may
be combined into a single Databricks Workflows (aka Jobs) - e.g. pulling
permissions and tables inventory is necessary for the assessment step,
but later the Workflows may branch out.
- if two classes are used together in 90% of cases, they have to be
defined in the single file (aka Python Module). We don't define `types`
and `managers` modules/packages, as this is an example of a
diamond-shaped dependencies, which are more difficult to evolve over
time.
- methods and fields have to be private by default. make them public
only by necessity.
- classes must have the following order - __init__, overridden methods,
public methods, other methods. the most important logic has to be first.
@nfx nfx mentioned this pull request Sep 18, 2023
nfx added a commit that referenced this pull request Sep 18, 2023
# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).
FastLee pushed a commit that referenced this pull request Sep 19, 2023
# Version changelog

## 0.1.0

Features

* Added interactive installation wizard
([#184](#184),
[#117](#117)).
* Added schedule of jobs as part of `install.sh` flow and created some
documentation ([#187](#187)).
* Added debug notebook companion to troubleshoot the installation
([#191](#191)).
* Added support for Hive Metastore Table ACLs inventory from all
databases ([#78](#78),
[#122](#122),
[#151](#151)).
* Created `$inventory.tables` from Scala notebook
([#207](#207)).
* Added local group migration support for ML-related objects
([#56](#56)).
* Added local group migration support for SQL warehouses
([#57](#57)).
* Added local group migration support for all compute-related resources
([#53](#53)).
* Added local group migration support for security-related objects
([#58](#58)).
* Added local group migration support for workflows
([#54](#54)).
* Added local group migration support for workspace-level objects
([#59](#59)).
* Added local group migration support for dashboards, queries, and
alerts ([#144](#144)).

Stability

* Added `codecov.io` publishing
([#204](#204)).
* Added more tests to group.py
([#148](#148)).
* Added tests for group state
([#133](#133)).
* Added tests for inventorizer and typed
([#125](#125)).
* Added tests WorkspaceListing
([#110](#110)).
* Added `make_*_permissions` fixtures
([#159](#159)).
* Added reusable fixtures module
([#119](#119)).
* Added testing for permissions
([#126](#126)).
* Added inventory table manager tests
([#153](#153)).
* Added `product_info` to track as SDK integration
([#76](#76)).
* Added failsafe permission get operations
([#65](#65)).
* Always install the latest `pip` version in `./install.sh`
([#201](#201)).
* Always store inventory in `hive_metastore` and make only
`inventory_database` configurable
([#178](#178)).
* Changed default logging level from `TRACE` to `DEBUG` log level
([#124](#124)).
* Consistently use `WorkspaceClient` from `databricks.sdk`
([#120](#120)).
* Convert pipeline code to use fixtures.
([#166](#166)).
* Exclude mixins from coverage
([#130](#130)).
* Fixed codecov.io reporting
([#212](#212)).
* Fixed configuration path in job task install code
([#210](#210)).
* Fixed a bug with dependency definitions
([#70](#70)).
* Fixed failing `test_jobs`
([#140](#140)).
* Fixed the issues with experiment listing
([#64](#64)).
* Fixed integration testing configuration
([#77](#77)).
* Make project runnable on nightly testing infrastructure
([#75](#75)).
* Migrated cluster policies to new fixtures
([#174](#174)).
* Migrated clusters to the new fixture framework
([#162](#162)).
* Migrated instance pool to the new fixture framework
([#161](#161)).
* Migrated to `databricks.labs.ucx` package
([#90](#90)).
* Migrated token authorization to new fixtures
([#175](#175)).
* Migrated experiment fixture to standard one
([#168](#168)).
* Migrated jobs test to fixture based one.
([#167](#167)).
* Migrated model fixture to the standard fixtures
([#169](#169)).
* Migrated warehouse fixture to standard one
([#170](#170)).
* Organise modules by domain
([#197](#197)).
* Prefetch all account-level and workspace-level groups
([#192](#192)).
* Programmatically create a dashboard
([#121](#121)).
* Properly integrate Python `logging` facility
([#118](#118)).
* Refactored code to use Databricks SDK for Python
([#27](#27)).
* Refactored configuration and remove global provider state
([#71](#71)).
* Removed `pydantic` dependency
([#138](#138)).
* Removed redundant `pyspark`, `databricks-connect`, `delta-spark`, and
`pandas` dependencies
([#193](#193)).
* Removed redundant `typer[all]` dependency and its usages
([#194](#194)).
* Renamed `MigrationGroupsProvider` to `GroupMigrationState`
([#81](#81)).
* Replaced `ratelimit` and `tenacity` dependencies with simpler
implementations ([#195](#195)).
* Reorganised integration tests to align more with unit tests
([#206](#206)).
* Run `build` workflow also on `main` branch
([#211](#211)).
* Run integration test with a single group
([#152](#152)).
* Simplify `SqlBackend` and table creation logic
([#203](#203)).
* Updated `migration_config.yml`
([#179](#179)).
* Updated legal information
([#196](#196)).
* Use `make_secret_scope` fixture
([#163](#163)).
* Use fixture factory for `make_table`, `make_schema`, and
`make_catalog` ([#189](#189)).
* Use new fixtures for notebooks and folders
([#176](#176)).
* Validate toolkit notebook test
([#183](#183)).

Contributing

* Added a note on external dependencies
([#139](#139)).
* Added ability to run SQL queries on Spark when in Databricks Runtime
([#108](#108)).
* Added some ground rules for contributing
([#82](#82)).
* Added contributing instructions link from main readme
([#109](#109)).
* Added info about environment refreshes
([#155](#155)).
* Clarified documentation
([#137](#137)).
* Enabled merge queue
([#146](#146)).
* Improved `CONTRIBUTING.md` guide
([#135](#135),
[#145](#145)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant