-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add UUID column to ImportMixin #11098
Conversation
superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin_py.py
Outdated
Show resolved
Hide resolved
superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin_py.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks simpler than I thought it would be. In the interest of minimizing the number of database migrations, do we want to alter Dashboard.json_metadata
in the same migration?
I'm open to more smaller PRs too. Unclear to me how much work migrating Dashboard.json_metadata
is, but it shouldn't be too bad.
I'm not sure... on one hand, doing it in this PR would reduce the number of migrations, which is great. On the other hand, having PRs with DB migrations being as small as possible makes it easier to cherry pick them. Do you have any preferences? I think updating the column |
One problem with db migrations is that they cannot be cherry-picked out of order. Or they can if they both reference the same parent and ultimately need a converging migration. All this is fairly confusing. Less migrations is probably better. Either way I'd advise release managers to avoid cherry-picking anything with a migration. |
Right. The case I was thinking of was when you want to cherry-pick feature If instead you separate the feature But in this case it doesn't matter, because the migration changing Thanks, Max! |
|
||
# add uniqueness constraint | ||
with op.batch_alter_table(model.__tablename__) as batch_op: | ||
batch_op.create_unique_constraint("uq_uuid", ["uuid"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we planning to do any lookups by uuid? Should we add an index on those columns if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think not in the near future, we're using it just to ensure consistent relationships.
|
12dddb0
to
1d4554b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor perf comment/question.
sa.Column( | ||
"uuid", | ||
UUIDType(binary=False), | ||
primary_key=False, | ||
default=uuid.uuid4, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had compatibility issues when using sqlalchemy_utils.UUIDType
on different databases some time ago (I believe I was mixing Postgres and Sqlite at the time). I believe the resolution back then was to use binary=False
like you've done, but I believe that eliminates the performance benefits of using a UUIDType
over a traditional CHAR
/VARCHAR
implementation. DId you try running it with binary=True
, did that cause CI trouble on Sqlite vs Postgres vs MySQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, @villebro! I haven't tried running with binary=True
, I'll give it a try as soon as I fix the unit tests that are not passing.
Codecov Report
@@ Coverage Diff @@
## master #11098 +/- ##
=========================================
Coverage ? 61.60%
=========================================
Files ? 829
Lines ? 39195
Branches ? 3688
=========================================
Hits ? 24145
Misses ? 14869
Partials ? 181
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another pass and everything LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a great step forward for imports/exports! ❤️
Base = declarative_base() | ||
|
||
|
||
class ImportMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida this migration is very slow, it is worth to mention in the changelog e.g. for our staging env it took ~30 min and often it means extra downtime for the service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, @bkyryliuk! I'll add it today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkyryliuk @betodealmeida I've managed to rewrite the uuid generation with native SQL queries and sped up the migration process by more than 100x. The whole migration job can now complete in under 5 minutes for our Superset db of more than 200k slices
and 1 million table_columns
. Do you mind taking a look and maybe testing it on your Superset deployments as well?
* Add note about #11098 * Update UPDATING.md Better description Co-authored-by: Jesse Yang <jesse.yang@airbnb.com> Co-authored-by: Jesse Yang <jesse.yang@airbnb.com>
* Add UUID column to ImportMixin * Fix default value * Fix lint * Fix order of downgrade * Add logging when downgrade fails * Migrate position_json to contain UUIDs, and add schedule tables * Save UUID when adding charts to dashboard * Fix heads * Rename migration file * Fix dashboard serialization * Fix migration script with Postgres * Fix unique contraint name * Handle UUID when exporting dashboard * Fix Dataset PUT * Add UUID JSON serialization * Fix tests * Simplify logic * Try binary=True
…1256) * Add note about apache#11098 * Update UPDATING.md Better description Co-authored-by: Jesse Yang <jesse.yang@airbnb.com> Co-authored-by: Jesse Yang <jesse.yang@airbnb.com>
SUMMARY
This PR is a simple rewrite of #7829, adding a UUID column to the
ImportMixin
. This initial work will be used to improve the import/export functionality in Superset by producing artifacts that are not dependent on the primary keys of a particular database.TEST PLAN
Confirmed that columns are created and populated.
ADDITIONAL INFORMATION