feat: add UUID column to ImportMixin #11098

betodealmeida · 2020-09-29T03:20:15Z

SUMMARY

This PR is a simple rewrite of #7829, adding a UUID column to the ImportMixin. This initial work will be used to improve the import/export functionality in Superset by producing artifacts that are not dependent on the primary keys of a particular database.

TEST PLAN

$ superset db upgrade 
$ superset db downgrade e5ef6828ac4e

Confirmed that columns are created and populated.

ADDITIONAL INFORMATION

superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin_py.py

superset/models/helpers.py

mistercrunch

Overall looks simpler than I thought it would be. In the interest of minimizing the number of database migrations, do we want to alter Dashboard.json_metadata in the same migration?

I'm open to more smaller PRs too. Unclear to me how much work migrating Dashboard.json_metadata is, but it shouldn't be too bad.

betodealmeida · 2020-09-29T04:14:51Z

Overall looks simpler than I thought it would be. In the interest of minimizing the number of database migrations, do we want to alter Dashboard.json_metadata in the same migration?

I'm not sure... on one hand, doing it in this PR would reduce the number of migrations, which is great. On the other hand, having PRs with DB migrations being as small as possible makes it easier to cherry pick them.

Do you have any preferences? I think updating the column Dashboard.position_json to use UUIDs is simple, but I'm worried about updating the logic that touches it. We'd also have to update the examples.

mistercrunch · 2020-09-29T04:25:08Z

One problem with db migrations is that they cannot be cherry-picked out of order. Or they can if they both reference the same parent and ultimately need a converging migration. All this is fairly confusing. Less migrations is probably better. Either way I'd advise release managers to avoid cherry-picking anything with a migration.

betodealmeida · 2020-09-29T04:42:18Z

One problem with db migrations is that they cannot be cherry-picked out of order. Or they can if they both reference the same parent and ultimately need a converging migration. All this is fairly confusing. Less migrations is probably better. Either way I'd advise release managers to avoid cherry-picking anything with a migration.

Right. The case I was thinking of was when you want to cherry-pick feature B, and it has a database migration Mb that comes after another migration Ma that also needs to be cherry-picked. If the PR implementing Ma and the feature A is big it's harder to cherry-pick it, but you're forced to do it because of the Alembic DAG.

If instead you separate the feature A into one PR, and the migration Ma into another, someone interested in cherry-picking feature B can just cherry-pick the actual migration Ma and skip the PR implementing A, assuming that the DB migration is non-disruptive — eg, adding a column like we're doing here.

But in this case it doesn't matter, because the migration changing position_json is disruptive, and can't be separated from the changes in the logic to read the new schema. So I'll go ahead and implement the migration of position_json in this PR to consolidate the migrations.

Thanks, Max!

eschutho · 2020-09-29T22:00:45Z

superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin_py.py

+
+        # add uniqueness constraint
+        with op.batch_alter_table(model.__tablename__) as batch_op:
+            batch_op.create_unique_constraint("uq_uuid", ["uuid"])


are we planning to do any lookups by uuid? Should we add an index on those columns if so?

Good point. I think not in the near future, we're using it just to ensure consistent relationships.

betodealmeida · 2020-09-29T22:15:35Z

-- BEFORE migration
sqlite> SELECT position_json FROM dashboards WHERE id=8;
{
    "CHART-Hkx6154FEm": {
        "children": [],
        "id": "CHART-Hkx6154FEm",
        "meta": {
            "chartId": 82,
            "height": 30,
            "sliceName": "slice 1",
            "width": 4
        },
        "type": "CHART"
    },
    "GRID_ID": {
        "children": [
            "ROW-SyT19EFEQ"
        ],
        "id": "GRID_ID",
        "type": "GRID"
    },
    "ROOT_ID": {
        "children": [
            "GRID_ID"
        ],
        "id": "ROOT_ID",
        "type": "ROOT"
    },
    "ROW-SyT19EFEQ": {
        "children": [
            "CHART-Hkx6154FEm"
        ],
        "id": "ROW-SyT19EFEQ",
        "meta": {
            "background": "BACKGROUND_TRANSPARENT"
        },
        "type": "ROW"
    },
    "DASHBOARD_VERSION_KEY": "v2"
}
-- AFTER migration
sqlite> SELECT position_json FROM dashboards WHERE id=8;
{
    "CHART-Hkx6154FEm": {
        "children": [],
        "id": "CHART-Hkx6154FEm",
        "meta": {
            "chartId": 82,
            "height": 30,
            "sliceName": "slice 1",
            "width": 4,
            "uuid": "706c8c3c-175b-4606-9016-4ef7e2ebff09"
        },
        "type": "CHART"
    },
    "GRID_ID": {
        "children": [
            "ROW-SyT19EFEQ"
        ],
        "id": "GRID_ID",
        "type": "GRID"
    },
    "ROOT_ID": {
        "children": [
            "GRID_ID"
        ],
        "id": "ROOT_ID",
        "type": "ROOT"
    },
    "ROW-SyT19EFEQ": {
        "children": [
            "CHART-Hkx6154FEm"
        ],
        "id": "ROW-SyT19EFEQ",
        "meta": {
            "background": "BACKGROUND_TRANSPARENT"
        },
        "type": "ROW"
    },
    "DASHBOARD_VERSION_KEY": "v2"
}
sqlite> SELECT * FROM slices WHERE uuid=REPLACE('706c8c3c-175b-4606-9016-4ef7e2ebff09', '-', '');
2020-09-23 12:21:27.651468|2020-09-23 12:43:27.098505|82|Unicode Cloud|table||word_cloud|{"granularity_sqla": "dttm", "groupby": [], "limit": "100", "metric": {"aggregate": "SUM", "column": {"column_name": "value"}, "expressionType": "SIMPLE", "label": "Value"}, "rotation": "square", "row_limit": 50000, "series": "short_phrase", "since": "100 years ago", "size_from": "10", "size_to": "70", "until": "now", "viz_type": "word_cloud", "remote_id": 33, "datasource_name": "unicode_test", "schema": null, "database_name": "examples", "import_time": 1600890207}|2|2|||[examples].[unicode_test](id:4)|4||706c8c3c175b460690164ef7e2ebff09
sqlite>

villebro

A minor perf comment/question.

villebro · 2020-10-04T07:44:53Z

superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin.py

+                sa.Column(
+                    "uuid",
+                    UUIDType(binary=False),
+                    primary_key=False,
+                    default=uuid.uuid4,
+                )


I've had compatibility issues when using sqlalchemy_utils.UUIDType on different databases some time ago (I believe I was mixing Postgres and Sqlite at the time). I believe the resolution back then was to use binary=False like you've done, but I believe that eliminates the performance benefits of using a UUIDType over a traditional CHAR/VARCHAR implementation. DId you try running it with binary=True, did that cause CI trouble on Sqlite vs Postgres vs MySQL?

Thanks for the feedback, @villebro! I haven't tried running with binary=True, I'll give it a try as soon as I fix the unit tests that are not passing.

codecov-io · 2020-10-06T19:23:03Z

Codecov Report

❗ No coverage uploaded for pull request base (master@94d4d55). Click here to learn what that means.
The diff coverage is 14.81%.

@@            Coverage Diff            @@
##             master   #11098   +/-   ##
=========================================
  Coverage          ?   61.60%           
=========================================
  Files             ?      829           
  Lines             ?    39195           
  Branches          ?     3688           
=========================================
  Hits              ?    24145           
  Misses            ?    14869           
  Partials          ?      181

Flag	Coverage Δ
#javascript	`62.30% <ø> (?)`
#python	`61.18% <14.81%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ns/b56500de1855_add_uuid_column_to_import_mixin.py	`0.00% <0.00%> (ø)`
superset/models/slice.py	`88.02% <ø> (ø)`
superset/views/core.py	`74.26% <0.00%> (ø)`
superset/dashboards/dao.py	`94.38% <100.00%> (ø)`
superset/datasets/schemas.py	`94.28% <100.00%> (ø)`
superset/models/helpers.py	`87.44% <100.00%> (ø)`
superset/utils/core.py	`89.50% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94d4d55...921960a. Read the comment docs.

mistercrunch

I did another pass and everything LGTM

villebro

LGTM, a great step forward for imports/exports! ❤️

betodealmeida · 2020-10-07T22:39:22Z

For some reason this PR broke master (9785667 errored), working on a fix on #11196.

bkyryliuk · 2020-10-08T16:28:39Z

superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin.py

+Base = declarative_base()
+
+
+class ImportMixin:


@betodealmeida this migration is very slow, it is worth to mention in the changelog e.g. for our staging env it took ~30 min and often it means extra downtime for the service

Good point, @bkyryliuk! I'll add it today.

@bkyryliuk @betodealmeida I've managed to rewrite the uuid generation with native SQL queries and sped up the migration process by more than 100x. The whole migration job can now complete in under 5 minutes for our Superset db of more than 200k slices and 1 million table_columns. Do you mind taking a look and maybe testing it on your Superset deployments as well?

* Add note about #11098 * Update UPDATING.md Better description Co-authored-by: Jesse Yang <jesse.yang@airbnb.com> Co-authored-by: Jesse Yang <jesse.yang@airbnb.com>

* Add UUID column to ImportMixin * Fix default value * Fix lint * Fix order of downgrade * Add logging when downgrade fails * Migrate position_json to contain UUIDs, and add schedule tables * Save UUID when adding charts to dashboard * Fix heads * Rename migration file * Fix dashboard serialization * Fix migration script with Postgres * Fix unique contraint name * Handle UUID when exporting dashboard * Fix Dataset PUT * Add UUID JSON serialization * Fix tests * Simplify logic * Try binary=True

…1256) * Add note about apache#11098 * Update UPDATING.md Better description Co-authored-by: Jesse Yang <jesse.yang@airbnb.com> Co-authored-by: Jesse Yang <jesse.yang@airbnb.com>

pull-request-size bot added the size/L label Sep 29, 2020

betodealmeida requested review from villebro and mistercrunch September 29, 2020 03:20

mistercrunch added the risk:db-migration PRs that require a DB migration label Sep 29, 2020

mistercrunch reviewed Sep 29, 2020

View reviewed changes

superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin_py.py Outdated Show resolved Hide resolved

betodealmeida added the database label Sep 29, 2020

mistercrunch reviewed Sep 29, 2020

View reviewed changes

superset/migrations/versions/b56500de1855_add_uuid_column_to_import_mixin_py.py Outdated Show resolved Hide resolved

mistercrunch reviewed Sep 29, 2020

View reviewed changes

superset/models/helpers.py Show resolved Hide resolved

mistercrunch reviewed Sep 29, 2020

View reviewed changes

eschutho reviewed Sep 29, 2020

View reviewed changes

betodealmeida added 8 commits September 29, 2020 19:05

Add UUID column to ImportMixin

da65260

Fix default value

17a3971

Fix lint

f117619

Fix order of downgrade

c927029

Add logging when downgrade fails

8fa4db7

Migrate position_json to contain UUIDs, and add schedule tables

fc5a8cb

Save UUID when adding charts to dashboard

232b171

Fix heads

1d4554b

betodealmeida force-pushed the PM-846 branch from 12dddb0 to 1d4554b Compare September 30, 2020 02:06

betodealmeida changed the title ~~Add UUID column to ImportMixin~~ feat: add UUID column to ImportMixin Sep 30, 2020

betodealmeida added 4 commits September 29, 2020 19:13

Rename migration file

6bcfdcc

Fix dashboard serialization

bcbbb0b

Fix migration script with Postgres

239ba38

Fix unique contraint name

0a4ea63

villebro reviewed Oct 4, 2020

View reviewed changes

Handle UUID when exporting dashboard

a8a55d6

betodealmeida added 4 commits October 5, 2020 11:40

Fix Dataset PUT

f171518

Add UUID JSON serialization

12d56ad

Fix tests

5ec2537

Simplify logic

aad4f87

Try binary=True

921960a

betodealmeida requested review from mistercrunch and villebro October 6, 2020 21:13

mistercrunch approved these changes Oct 7, 2020

View reviewed changes

villebro approved these changes Oct 7, 2020

View reviewed changes

betodealmeida merged commit 9785667 into apache:master Oct 7, 2020

betodealmeida mentioned this pull request Oct 8, 2020

fix: skip unit test that is failing in master for test-postgres-hive #11196

Merged

6 tasks

bkyryliuk reviewed Oct 8, 2020

View reviewed changes

ktmud mentioned this pull request Oct 9, 2020

perf: speed up uuid column generation #11209

Merged

6 tasks

betodealmeida mentioned this pull request Oct 13, 2020

docs: add note about migration in #11098 to Changelog #11256

Merged

6 tasks

betodealmeida added a commit to betodealmeida/incubator-superset that referenced this pull request Oct 13, 2020

Add note about apache#11098

739d09f

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add UUID column to ImportMixin #11098

feat: add UUID column to ImportMixin #11098

betodealmeida commented Sep 29, 2020

mistercrunch left a comment •

edited

Loading

betodealmeida commented Sep 29, 2020

mistercrunch commented Sep 29, 2020

betodealmeida commented Sep 29, 2020

eschutho Sep 29, 2020

betodealmeida Sep 30, 2020

betodealmeida commented Sep 29, 2020 •

edited

Loading

villebro left a comment

villebro Oct 4, 2020

betodealmeida Oct 5, 2020

codecov-io commented Oct 6, 2020 •

edited

Loading

mistercrunch left a comment

villebro left a comment

betodealmeida commented Oct 7, 2020 •

edited

Loading

bkyryliuk Oct 8, 2020

betodealmeida Oct 8, 2020

ktmud Oct 10, 2020

		Base = declarative_base()


		class ImportMixin:

feat: add UUID column to ImportMixin #11098

feat: add UUID column to ImportMixin #11098

Conversation

betodealmeida commented Sep 29, 2020

SUMMARY

TEST PLAN

ADDITIONAL INFORMATION

mistercrunch left a comment • edited Loading

Choose a reason for hiding this comment

betodealmeida commented Sep 29, 2020

mistercrunch commented Sep 29, 2020

betodealmeida commented Sep 29, 2020

eschutho Sep 29, 2020

Choose a reason for hiding this comment

betodealmeida Sep 30, 2020

Choose a reason for hiding this comment

betodealmeida commented Sep 29, 2020 • edited Loading

villebro left a comment

Choose a reason for hiding this comment

villebro Oct 4, 2020

Choose a reason for hiding this comment

betodealmeida Oct 5, 2020

Choose a reason for hiding this comment

codecov-io commented Oct 6, 2020 • edited Loading

Codecov Report

mistercrunch left a comment

Choose a reason for hiding this comment

villebro left a comment

Choose a reason for hiding this comment

betodealmeida commented Oct 7, 2020 • edited Loading

bkyryliuk Oct 8, 2020

Choose a reason for hiding this comment

betodealmeida Oct 8, 2020

Choose a reason for hiding this comment

ktmud Oct 10, 2020

Choose a reason for hiding this comment

mistercrunch left a comment •

edited

Loading

betodealmeida commented Sep 29, 2020 •

edited

Loading

codecov-io commented Oct 6, 2020 •

edited

Loading

betodealmeida commented Oct 7, 2020 •

edited

Loading