fix!: migrate `argo_archived_workflows.workflow` to `jsonb` #13779

MasonM · 2024-10-17T21:26:29Z

Motivation

As explained in #13601 (comment), I believe #12912 introduced a performance regression when listing workflows for PostgreSQL users due to TOAST issues. Reverting that PR would solve that particular issue, but it could re-introduce the memory issues mentioned in the PR description. Instead, this mitigates the impact by converting the workflow column to be of type jsonb.

Modifications

Initially workflow was of type text, and was migrated to json in #2152. I'm not sure why jsonb wasn't chosen, but based on this comment in the linked issue, I think it was simply an oversight. It's possible compatibility with older PostgreSQL versions was a concern. Support for jsonb was introduced in PostgreSQL 9.4, and PostgreSQL 9.3 became end-of-life on November 8, 2018. I don't think we should be concerned about breaking compatibility with EOL PostgreSQL versions, especially versions that have been EOL for nearly 6 years.

Here's what the relevant docs say about choosing between json and jsonb:

The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

I'm pretty sure we don't care about key order or whitespace. We do care somewhat about insertion speed, but archived workflows are read much more frequently than written, so a slight reduction in write speed that gives a large improvement in read speed is a good tradeoff. As the benchmarks below show, this gives a ~75% performance improvement for calls to ListWorkflows().

The biggest problem with this change is the time needed for the migration. With my local DB populated with 100,000 records, it takes ~70s (see below). The good news is this is fully backwards-compatible, and users can opt to manually run alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb in their DB before upgrading. We'll probably want to add a prominent warning about this.

Verification

Here's steps to test this:

Use test: simple DB CLI for local development #13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:

$ time go run ./hack/db fake-archived-workflows  --template "@very-large-workflow.yaml" --rows 100000
Using seed 1935828722624432788
Clusters: [default]
Namespaces: [argo]
Inserted 100000 rows

real    18m35.316s
user    3m2.447s
sys     0m44.972s

Run the benchmarks using test: basic benchmarks for workflow archive DB operations #13767:
```
make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt
```

Run the migration the DB CLI:

$ time go run ./hack/db migrate
INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
INFO[0000] applying database change                      change="alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb" changeSchemaVersion=60
2024/10/17 18:07:42     Session ID:     00001
        Query:          alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
        Stack:
                fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673
                fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756
                fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1208
                fmt.Append@/usr/local/go/src/fmt/print.go:289
                log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261
                log.(*Logger).output@/usr/local/go/src/log/log.go:238
                log.(*Logger).Print@/usr/local/go/src/log/log.go:260
                github.com/argoproj/argo-workflows/v3/persist/sqldb.ansiSQLChange.apply@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/ansi_sql_change.go:11
                github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:295
                github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:284
                github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.Exec@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:273
                main.NewMigrateCommand.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:50
                github.com/spf13/cobra.(*Command).execute@/home/vscode/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985
                github.com/spf13/cobra.(*Command).ExecuteC@/home/vscode/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117
                github.com/spf13/cobra.(*Command).Execute@/home/vscode/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041
                main.main@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:39
                runtime.main@/usr/local/go/src/runtime/proc.go:272
                runtime.goexit@/usr/local/go/src/runtime/asm_amd64.s:1700
        Rows affected:  0
        Error:          upper: slow query
        Time taken:     69.12755s
        Context:        context.Background

real    1m10.087s
user    0m1.541s
sys     0m0.410s

Note: the upper: slow query error is harmless and doesn't impact the migration.

Re-run the benchmarks:

make BenchmarkWorkflowArchive > postgres_after_10000_workflows.txt

Compare results using benchstat:

$ benchstat postgres_before_10000_workflows3.txt postgres_after_10000_workflows2.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │ postgres_before_10000_workflows3.txt │  postgres_after_10000_workflows2.txt  │
                                                     │                sec/op                │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                                              183.83m ± ∞ ¹   24.69m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                          192.71m ± ∞ ¹   25.87m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                              13.04m ± ∞ ¹   11.75m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                                        77.31m         19.58m        -74.68%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                     │ postgres_before_10000_workflows3.txt │  postgres_after_10000_workflows2.txt  │
                                                     │                 B/op                 │     B/op       vs base                │
WorkflowArchive/ListWorkflows-12                                              497.2Ki ± ∞ ¹   497.5Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                          503.1Ki ± ∞ ¹   503.9Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                             8.972Ki ± ∞ ¹   8.899Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                                                                       130.9Ki         130.7Ki        -0.20%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                     │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt  │
                                                     │              allocs/op               │  allocs/op    vs base                │
WorkflowArchive/ListWorkflows-12                                               8.373k ± ∞ ¹   8.370k ± ∞ ¹       ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                           8.410k ± ∞ ¹   8.406k ± ∞ ¹       ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                               212.0 ± ∞ ¹    212.0 ± ∞ ¹       ~ (p=1.000 n=1) ³
geomean                                                                        2.462k         2.462k        -0.03%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
³ all samples are equal

With PostgreSQL, the `argo_archived_workflows.workflow` column has been of type `json` ever since 8a1e611, which was released as v2.5.0. Therefore, the `::json` casts do nothing, and prevent users from improving performance by migrating to JSONB using the following query: ```sql alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb ``` Without the changes in this PR, running the above will massively slow down the queries, because casting JSONB to JSON is expensive. With the changes in this PR, it improves performance by ~80%, which I determined by running the benchmarks added in argoproj#13767 against a DB with 100,000 randomly generated workflows generated by argoproj#13715: ``` $ benchstat postgres_before_10000_workflows.txt postgres_after_10000_workflows.txt goos: linux goarch: amd64 pkg: github.com/argoproj/argo-workflows/v3/test/e2e cpu: 12th Gen Intel(R) Core(TM) i5-12400 │ postgres_before_10000_workflows.txt │ postgres_after_10000_workflows.txt │ │ sec/op │ sec/op vs base │ WorkflowArchive/ListWorkflows-12 185.81m ± ∞ ¹ 25.06m ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 186.35m ± ∞ ¹ 25.99m ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 42.39m ± ∞ ¹ 11.78m ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 113.6m 19.72m -82.64% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 ``` The only downside to migrating to JSONB is it can take a long time if you've got a ton of workflows (~72s on my local DB with 100,000 workflows). I'll enter a separate PR for the migration, but I'm entering this change separately so it can hopefully go out in 3.6.0. Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

…j#13601 As explained in argoproj#13601 (comment), I believe argoproj#12912 introduced a performance regression when listing workflows for PostgreSQL users. Reverting that PR could re-introduce the memory issues mentioned in the PR description, so instead this mitigates the impact by converting the `workflow` column to be of type `jsonb`. Initially `workflow` was of type `text`, and was migrated to `json` in argoproj#2152. I'm not sure why `jsonb` wasn't chosen, but [based on this comment in the linked issue](argoproj#2133 (comment)), I think it was simply an oversight. Here's the relevant docs (https://www.postgresql.org/docs/current/datatype-json.html): > The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage. > > Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept. > > In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys. I'm pretty sure we don't care about key order or whitespace. We do care somewhat about insertion speed, but archived workflows are read much more frequently than written, so a slight reduction in write speed that gives a large improvement in read speed is a good tradeoff. Here's steps to test this: 1. Use argoproj#13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template: ``` $ time go run ./hack/db fake-archived-workflows --template "@very-large-workflow.yaml" --rows 100000 Using seed 1935828722624432788 Clusters: [default] Namespaces: [argo] Inserted 100000 rows real 18m35.316s user 3m2.447s sys 0m44.972s ``` 2. Run the benchmarks using argoproj#13767: ``` make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt ``` 3. Run the migration the DB CLI: ``` $ time go run ./hack/db migrate INFO[0000] Migrating database schema clusterName=default dbType=postgres INFO[0000] applying database change change="alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb" changeSchemaVersion=60 2024/10/17 18:07:42 Session ID: 00001 Query: alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb Stack: fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673 fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756 fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1208 fmt.Append@/usr/local/go/src/fmt/print.go:289 log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261 log.(*Logger).output@/usr/local/go/src/log/log.go:238 log.(*Logger).Print@/usr/local/go/src/log/log.go:260 github.com/argoproj/argo-workflows/v3/persist/sqldb.ansiSQLChange.apply@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/ansi_sql_change.go:11 github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:295 github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:284 github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.Exec@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:273 main.NewMigrateCommand.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:50 github.com/spf13/cobra.(*Command).execute@/home/vscode/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985 github.com/spf13/cobra.(*Command).ExecuteC@/home/vscode/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 github.com/spf13/cobra.(*Command).Execute@/home/vscode/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041 main.main@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:39 runtime.main@/usr/local/go/src/runtime/proc.go:272 runtime.goexit@/usr/local/go/src/runtime/asm_amd64.s:1700 Rows affected: 0 Error: upper: slow query Time taken: 69.12755s Context: context.Background real 1m10.087s user 0m1.541s sys 0m0.410s ``` 2. Re-run the benchmarks: ``` make BenchmarkWorkflowArchive > postgres_after_10000_workflows.txt ``` 4. Compare results using [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat): ``` $ benchstat postgres_before_10000_workflows3.txt postgres_after_10000_workflows2.txt goos: linux goarch: amd64 pkg: github.com/argoproj/argo-workflows/v3/test/e2e cpu: 12th Gen Intel(R) Core(TM) i5-12400 │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt │ │ sec/op │ sec/op vs base │ WorkflowArchive/ListWorkflows-12 183.83m ± ∞ ¹ 24.69m ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 192.71m ± ∞ ¹ 25.87m ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 13.04m ± ∞ ¹ 11.75m ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 77.31m 19.58m -74.68% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt │ │ B/op │ B/op vs base │ WorkflowArchive/ListWorkflows-12 497.2Ki ± ∞ ¹ 497.5Ki ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 503.1Ki ± ∞ ¹ 503.9Ki ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 8.972Ki ± ∞ ¹ 8.899Ki ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 130.9Ki 130.7Ki -0.20% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt │ │ allocs/op │ allocs/op vs base │ WorkflowArchive/ListWorkflows-12 8.373k ± ∞ ¹ 8.370k ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 8.410k ± ∞ ¹ 8.406k ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 212.0 ± ∞ ¹ 212.0 ± ∞ ¹ ~ (p=1.000 n=1) ³ geomean 2.462k 2.462k -0.03% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 ³ all samples are equal ``` Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

agilgur5 · 2024-10-17T22:44:52Z

I've marked the title as breaking here (fix!) as it is a schema change and requires a long migration, as you mentioned

Fixes #13601

I also don't think this fully fixes #13601 since it does not affect MySQL, is significantly faster than JSON in Postgres, but can still take multiple seconds, and doesn't resolve the description of the issue either, rather another related performance issue brought up in the thread

MasonM · 2024-10-17T22:58:05Z

@agilgur5 Okay, I removed references to #13601 from the title and description. The reason I referenced it is because nearly all the comments in that issue (and the user reports) are about general performance issues with PostgreSQL, which this PR addresses, but you're right it doesn't address MySQL or the index optimization mentioned (CREATE INDEX argo_archived_workflows_i5 ON argo_archived_workflows (clustername, startedat).

I did try benchmarking the index optimization, but the performance improvement was very small, which is why I didn't include it here:

$ benchstat before.txt after.txt 
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │  before.txt  │              after.txt               │
                                                     │    sec/op    │    sec/op     vs base                │
WorkflowArchive/ListWorkflows-12                       185.9m ± ∞ ¹   188.9m ± ∞ ¹       ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12   199.3m ± ∞ ¹   185.0m ± ∞ ¹       ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                      13.67m ± ∞ ¹   13.24m ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                                                79.71m         77.35m        -2.96%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

MasonM · 2024-10-18T04:33:26Z

Also

but can still take multiple seconds

I'm pretty sure that's solely due to the JSON cast issue fixed in #13777 (and this PR). I spent over a week trying to reproduce that and I can't. I think it's best to focus on the issues we can reproduce first.

agilgur5 · 2024-10-18T04:51:41Z

I did try benchmarking the index optimization, but the performance improvement was very small, which is why I didn't include it here:

It would still be better and less complex than a whole subquery IMO

I'm pretty sure that's solely due to the JSON cast issue fixed in #13777

You believe the cast itself adding multiple seconds to a query? Certainly possible, but that would be even more surprising than the index misses or existing JSON inefficiencies with this issue in the past 😅

I think it's best to focus on the issues we can reproduce first.

Sure, I just don't think we should necessarily close out the issue until we've ironed out all the remaining loose items, especially as I made the issue intentionally to cover all the remaining loose items after the previous ones

MasonM · 2024-10-18T05:04:37Z

You believe the cast itself adding multiple seconds to a query? Certainly possible, but that would be even more surprising than the index misses or existing JSON inefficiencies with this issue in the past 😅

It slows it down by over 2x on my machine. But you don't have to take my word for it, because you can reproduce this yourself by following in the instructions in the PR. Just revert the changes made to persist/sqldb/workflow_archive.go.

Here's the benchstat output I get:

 $ benchstat before.txt after.txt 
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │  before.txt  │               after.txt               │
                                                     │    sec/op    │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                       181.4m ± ∞ ¹   417.0m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12   181.6m ± ∞ ¹   417.0m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                      44.88m ± ∞ ¹   11.59m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                113.9m         126.3m        +10.90%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Sure, I just don't think we should necessarily close out the issue until we've ironed out all the remaining loose items, especially as I made the issue intentionally to cover all the remaining loose items after the previous ones

Okay, that's reasonable.

agilgur5 · 2024-10-18T05:32:29Z

It slows it down by over 2x on my machine.

The cast alone? 👀 👀 As far as I could tell, here and in #13777 you mentioned both removing the cast and moving to JSONB

But you don't have to take my word for it, because you can reproduce this yourself by following in the instructions in the PR

I'm pretty sure

You had also said "pretty sure" before, so I thought you hadn't narrowed that down yet

Joibel · 2024-10-18T09:11:28Z

I am not sure how I feel about this change. Whilst I completely agree with the goal of speed, it changes the behaviour of archiving in subtle but surprising ways.

We would end up have two different behaviours depending upon whether you're using mysql or postgres. If you want a 'perfect archive' (which for some users may be something they care about, perhaps compliance reasons), you must use mysql.

I can't see a mechanism whereby we would end up with bugs with duplicate keys, but archived workflows in postgres would hide them after this change.

We should document this change because it's reasonable to expect the contents of an archived workflow to exactly match that of that workflow before archiving, and that won't be the case. Have you considered an option (which most people wouldn't use) to store and present the textual version, but query via jsonb?

kodieg · 2024-10-18T09:21:03Z

According to this documentation mysql performs some kind of json normalization during json insertion: https://dev.mysql.com/doc/refman/8.4/en/json.html

I checked this also on https://www.db-fiddle.com/. Maybe someone with mysql access could also verify it. However, it seems that currently mysql does not perform perfect archive and this is different behaviour from what current implementation in postgres does.

MasonM · 2024-10-19T00:14:41Z

The cast alone? 👀 👀 As far as I could tell, here and in #13777 you mentioned both removing the cast and moving to JSONB

The comment you linked (#13601 (comment)) said "I tested converting json to jsonb and observed times approximately 9s." I'm assuming this was measured with the cast present (i.e. the current behavior), since that's the only scenario where I could reproduce such a slowdown. There may be other scenarios, but there's too many variables involved for me to narrow it down further.

I am not sure how I feel about this change. Whilst I completely agree with the goal of speed, it changes the behaviour of archiving in subtle but surprising ways.

We would end up have two different behaviours depending upon whether you're using mysql or postgres. If you want a 'perfect archive' (which for some users may be something they care about, perhaps compliance reasons), you must use mysql.

As @kodieg correctly pointed out, MySQL is already stripping whitespace and eliminating duplicate keys, so this PR actually makes the behavior more consistent, not less. You can verify this yourself by following these steps:

Save the following file as test.sql:

insert into argo_archived_workflows (uid, instanceid, name, namespace, phase, clustername, workflow) values ('0b3ec561-bc78-400c-9772-ce11c69b3d65', '', 'dg92nw8qr4k', 'argo', 'Succeeded', 'default', '{ "key2": 1, "key2":  2, "key1": 1}');
select workflow from argo_archived_workflows where uid='0b3ec561-bc78-400c-9772-ce11c69b3d65';

Run make start PROFILE=mysql

Run make mysql-cli < test.sql . Output:

$ make mysql-cli < test.sql 
GIT_COMMIT=c9f0376594aed5ae0ebc53d3bf2970300406860f GIT_BRANCH=feat-cli-db-generator2 GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/arm64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
kubectl exec -ti `kubectl get pod -l app=mysql -o name|cut -c 5-` -- mysql -u mysql -ppassword argo
Unable to use a TTY - input is not a terminal or the right kind of file
mysql: [Warning] Using a password on the command line interface can be insecure.
workflow
{"key1": 1, "key2": 2}

Run make start PROFILE=postgres

Run make postgres-cli < test.sql once on the main branch and once with this migration applied.
Output on main:

$ make postgres-cli < test.sql 
GIT_COMMIT=c9f0376594aed5ae0ebc53d3bf2970300406860f GIT_BRANCH=feat-cli-db-generator2 GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/arm64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
kubectl exec -ti `kubectl get pod -l app=postgres -o name|cut -c 5-` -- psql -U postgres
Unable to use a TTY - input is not a terminal or the right kind of file
INSERT 0 1
              workflow               
-------------------------------------
 { "key2": 1, "key2":  2, "key1": 1}
(1 row)

Output with migration applied:

$ make postgres-cli < test.sql 
GIT_COMMIT=c9f0376594aed5ae0ebc53d3bf2970300406860f GIT_BRANCH=feat-cli-db-generator2 GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/arm64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
kubectl exec -ti `kubectl get pod -l app=postgres -o name|cut -c 5-` -- psql -U postgres
Unable to use a TTY - input is not a terminal or the right kind of file
INSERT 0 1
        workflow        
------------------------
 {"key1": 1, "key2": 2}
(1 row)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

Joibel · 2024-10-21T15:03:47Z

Ok, so if we're going to get this in for 3.6 it should have some notes as to its effect in docs/upgrading.md, and how to do it prior to upgrade.

agilgur5 · 2024-10-21T15:55:39Z

and how to do it prior to upgrade.

Migrations happen automatically during upgrades, so it only needs a note that it will happen, not how to do it.

Joibel · 2024-10-21T16:08:58Z

and how to do it prior to upgrade.

Migrations happen automatically during upgrades, so it only needs a note that it will happen, not how to do it.

Ah, sorry, that was unclear. I was meaning referring to this in the original issue description:

The good news is this is fully backwards-compatible, and users can opt to manually run alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb in their DB before upgrading. We'll probably want to add a prominent warning about this.

Joibel

I will document this in a separate PR

This adds some additional details about argoproj#13779, primarily to assist users with a very large number of archived workflows who may not be comfortable with extended downtime. I tested the manual migration strategy locally after populating my local database with 100,000 rows using argoproj#13715: ``` $ time make postgres-cli < migrate_before.sql GIT_COMMIT=f25960ac6f63bc3cf1a3b8a9813cf43c61921262 GIT_BRANCH=fix-migrate-jsonb GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64 RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true kubectl exec -ti svc/postgres -- psql -U postgres Unable to use a TTY - input is not a terminal or the right kind of file ALTER TABLE CREATE FUNCTION CREATE TRIGGER UPDATE 100000 real 1m16.118s user 0m2.639s sys 0m0.548s $ time make postgres-cli < migrate_after.sql GIT_COMMIT=49ff7a44ba307416282a1f5cd3b844d19bce7f88 GIT_BRANCH=main GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=false VERSION=latest KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64 RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false STATIC_FILES=true ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true kubectl exec -ti svc/postgres -- psql -U postgres Unable to use a TTY - input is not a terminal or the right kind of file BEGIN LOCK TABLE DROP TRIGGER ALTER TABLE ALTER TABLE ALTER TABLE COMMIT real 0m1.503s user 0m2.786s sys 0m0.540s ``` Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

MasonM · 2024-10-25T03:00:39Z

@Joibel Sorry for the delay. I was researching ways that users with a very large number of workflows could do this migration with minimal downtime. I entered #13816 with my findings.

MasonM added 2 commits October 17, 2024 13:05

This was referenced Oct 17, 2024

fix: remove JSON cast when querying archived workflows #13777

Merged

3.5: Further optimize Archive List API call / DB query #13601

Open

MasonM changed the title ~~fix: migrate argo_archived_workflows.workflow to JSONB. Fixes #13601~~ fix: migrate argo_archived_workflows.workflow to jsonb. Fixes #13601 Oct 17, 2024

MasonM marked this pull request as ready for review October 17, 2024 22:36

agilgur5 changed the title ~~fix: migrate argo_archived_workflows.workflow to jsonb. Fixes #13601~~ fix!: migrate argo_archived_workflows.workflow to jsonb. Fixes #13601 Oct 17, 2024

MasonM changed the title ~~fix!: migrate argo_archived_workflows.workflow to jsonb. Fixes #13601~~ fix!: migrate argo_archived_workflows.workflow to jsonb Oct 17, 2024

agilgur5 added the area/workflow-archive label Oct 17, 2024

MasonM added 2 commits October 18, 2024 17:25

Merge branch 'main' into fix-migrate-jsonb

85d094d

fix: minor adjustments to comment

f25960a

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

Joibel approved these changes Oct 24, 2024

View reviewed changes

Joibel merged commit 92bf975 into argoproj:main Oct 24, 2024
28 checks passed

Joibel mentioned this pull request Oct 24, 2024

docs: add 3.6 upgrading notes for postgres archive and ARGO_TEMPLATE #13803

Merged

agilgur5 added this to the v3.6.0 milestone Oct 24, 2024

MasonM mentioned this pull request Oct 25, 2024

docs: additional details on jsonb migration for 3.6 #13816

Merged

MasonM mentioned this pull request Oct 26, 2024

fix: further optimize archive workflow listing. Fixes #13601 #13819

Open

agilgur5 added the area/api Argo Server API label Oct 26, 2024

MasonM mentioned this pull request Nov 10, 2024

API query error: unsupported Unicode escape sequence (SQLSTATE 22P05) #13711

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix!: migrate `argo_archived_workflows.workflow` to `jsonb` #13779

fix!: migrate `argo_archived_workflows.workflow` to `jsonb` #13779

MasonM commented Oct 17, 2024 •

edited

Loading

agilgur5 commented Oct 17, 2024 •

edited

Loading

MasonM commented Oct 17, 2024

MasonM commented Oct 18, 2024 •

edited

Loading

agilgur5 commented Oct 18, 2024 •

edited

Loading

MasonM commented Oct 18, 2024

agilgur5 commented Oct 18, 2024

Joibel commented Oct 18, 2024

kodieg commented Oct 18, 2024

MasonM commented Oct 19, 2024

Joibel commented Oct 21, 2024

agilgur5 commented Oct 21, 2024

Joibel commented Oct 21, 2024

Joibel left a comment

MasonM commented Oct 25, 2024

fix!: migrate argo_archived_workflows.workflow to jsonb #13779

fix!: migrate argo_archived_workflows.workflow to jsonb #13779

Conversation

MasonM commented Oct 17, 2024 • edited Loading

Motivation

Modifications

Verification

agilgur5 commented Oct 17, 2024 • edited Loading

MasonM commented Oct 17, 2024

MasonM commented Oct 18, 2024 • edited Loading

agilgur5 commented Oct 18, 2024 • edited Loading

MasonM commented Oct 18, 2024

agilgur5 commented Oct 18, 2024

Joibel commented Oct 18, 2024

kodieg commented Oct 18, 2024

MasonM commented Oct 19, 2024

Joibel commented Oct 21, 2024

agilgur5 commented Oct 21, 2024

Joibel commented Oct 21, 2024

Joibel left a comment

Choose a reason for hiding this comment

MasonM commented Oct 25, 2024

fix!: migrate `argo_archived_workflows.workflow` to `jsonb` #13779

fix!: migrate `argo_archived_workflows.workflow` to `jsonb` #13779

MasonM commented Oct 17, 2024 •

edited

Loading

agilgur5 commented Oct 17, 2024 •

edited

Loading

MasonM commented Oct 18, 2024 •

edited

Loading

agilgur5 commented Oct 18, 2024 •

edited

Loading