Skip to content

Commit

Permalink
fix: remove JSON cast when querying archived workflows
Browse files Browse the repository at this point in the history
With PostgreSQL, the `argo_archived_workflows.workflow` column has been
of type `json` ever since 8a1e611,
which was released as v2.5.0. Therefore, the `::json` casts do nothing,
and prevent users from improving performance by migrating to JSONB using the following query:
```sql
alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
```

Without the changes in this PR, running the above will massively slow
down the queries, because casting JSONB to JSON is expensive. With the
changes in this PR, it improves performance by ~80%, which I determined
by running the benchmarks added in
argoproj#13767 against a DB with
100,000 randomly generated workflows generated by
argoproj#13715:
```
$ benchstat postgres_before_10000_workflows.txt postgres_after_10000_workflows.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │ postgres_before_10000_workflows.txt │  postgres_after_10000_workflows.txt   │
                                                     │               sec/op                │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                                             185.81m ± ∞ ¹   25.06m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                         186.35m ± ∞ ¹   25.99m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                             42.39m ± ∞ ¹   11.78m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                                       113.6m         19.72m        -82.64%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
```

The only downside to migrating to JSONB is it can take a long time
if you've got a ton of workflows (~72s on my local DB with 100,000
workflows). I'll enter a separate PR for the migration, but I'm entering
this change separately so it can hopefully go out in 3.6.0.

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
  • Loading branch information
MasonM committed Oct 17, 2024
1 parent 415007c commit ded3409
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion persist/sqldb/workflow_archive.go
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ func selectArchivedWorkflowQuery(t dbType) (*db.RawExpr, error) {
case MySQL:
return db.Raw("name, namespace, uid, phase, startedat, finishedat, coalesce(JSON_EXTRACT(workflow,'$.metadata.labels'), '{}') as labels,coalesce(JSON_EXTRACT(workflow,'$.metadata.annotations'), '{}') as annotations, coalesce(JSON_UNQUOTE(JSON_EXTRACT(workflow,'$.status.progress')), '') as progress, coalesce(JSON_UNQUOTE(JSON_EXTRACT(workflow,'$.metadata.creationTimestamp')), '') as creationtimestamp, JSON_UNQUOTE(JSON_EXTRACT(workflow,'$.spec.suspend')) as suspend, coalesce(JSON_UNQUOTE(JSON_EXTRACT(workflow,'$.status.message')), '') as message, coalesce(JSON_UNQUOTE(JSON_EXTRACT(workflow,'$.status.estimatedDuration')), '0') as estimatedduration, coalesce(JSON_EXTRACT(workflow,'$.status.resourcesDuration'), '{}') as resourcesduration"), nil
case Postgres:
return db.Raw("name, namespace, uid, phase, startedat, finishedat, coalesce((workflow::json)->'metadata'->>'labels', '{}') as labels, coalesce((workflow::json)->'metadata'->>'annotations', '{}') as annotations, coalesce((workflow::json)->'status'->>'progress', '') as progress, coalesce((workflow::json)->'metadata'->>'creationTimestamp', '') as creationtimestamp, (workflow::json)->'spec'->>'suspend' as suspend, coalesce((workflow::json)->'status'->>'message', '') as message, coalesce((workflow::json)->'status'->>'estimatedDuration', '0') as estimatedduration, coalesce((workflow::json)->'status'->>'resourcesDuration', '{}') as resourcesduration"), nil
return db.Raw("name, namespace, uid, phase, startedat, finishedat, coalesce(workflow->'metadata'->>'labels', '{}') as labels, coalesce(workflow->'metadata'->>'annotations', '{}') as annotations, coalesce(workflow->'status'->>'progress', '') as progress, coalesce(workflow->'metadata'->>'creationTimestamp', '') as creationtimestamp, workflow->'spec'->>'suspend' as suspend, coalesce(workflow->'status'->>'message', '') as message, coalesce(workflow->'status'->>'estimatedDuration', '0') as estimatedduration, coalesce(workflow->'status'->>'resourcesDuration', '{}') as resourcesduration"), nil
}
return nil, fmt.Errorf("unsupported db type %s", t)
}

0 comments on commit ded3409

Please sign in to comment.