Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: merge the partition-level stats to global-level stats #22667

Merged
merged 15 commits into from
Feb 5, 2021

Conversation

Reminiscent
Copy link
Contributor

What problem does this PR solve?

sub-PR for the PR#22472

Problem Summary:
When we use analyze table statement for the partition table in dynamic-only mode, we will try to merge partition-level stats to get the global-level stats for the partition table.

What is changed and how it works?

Proposal: Partition Table Statistics (in Chinese)

What's Changed. How it Works:

  1. When we build the analyze task, we build every task for every partition. And record the table ID to which table the partition belongs to.
  2. If there are some tasks related to the partition table, we will record the partition table ID and index ID.
  3. We use the partition table ID to get the corresponding partition-level stats from the storage. And merge them to build the global-level stats.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

  • Build the global statistics for the partition table

@Reminiscent Reminiscent requested review from a team as code owners February 2, 2021 08:12
@Reminiscent Reminiscent requested review from XuHuaiyu and removed request for a team February 2, 2021 08:12
@Reminiscent
Copy link
Contributor Author

wait for PR#22603 and PR#22433 merge

@Reminiscent Reminiscent changed the title planner, statistics: merge the partition-level stats to global-level stats statistics: merge the partition-level stats to global-level stats Feb 2, 2021
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
Comment on lines 314 to 321
globalStats.Cms[i] = allCms[i][0].Copy()
for j := uint64(1); j < partitionNum; j++ {
err := globalStats.Cms[i].MergeCMSketch(allCms[i][j])
if err != nil {
logutil.BgLogger().Debug("failed to merge the CMSketch when we merge the partition-level stats to global-level stats")
return
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can merge this part into an additional function to make here clear?

return
}
tableInfo := partitionTable.Meta()
partitionStats, err := h.TableStatsFromStorage(tableInfo, partitionID, false, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put this in the cache. But I think it's a tradeoff. Because we just use the partition-level stats to build the global-level stats. Hardly used in other situations. This will take up a lot of cache memory to store these stats that are likely to be used only once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partition-level stats can be divided into two kinds. One kind is not involved in current analyze and we just read stats from storage to participate in the merging process. The other kind is newly calculated and saved to storage in current analyze. Reading the second kind stats from storage seems unnecessary since we have the stats in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @qw4990 , we think maybe we can do this optimization later. It is a low-frequency operation. What's your opinion @xuyifangreeneyes

@qw4990 qw4990 requested review from xuyifangreeneyes and removed request for XuHuaiyu February 2, 2021 12:16
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
statistics/table.go Outdated Show resolved Hide resolved
statistics/handle/handle.go Outdated Show resolved Hide resolved
Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some basic test cases for this PR, for example:

  1. create a partition table and analyze it and check if MergePartitionStats2GlobalStats is triggered;
  2. generate an error when merging partition-stats and then check its statistics is not polluted(since when the error arise, all operations should be rollbacked);

@Reminiscent
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 4, 2021
Copy link
Contributor

@rebelice rebelice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Feb 4, 2021
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Feb 4, 2021
err = errors.Errorf("TODO: The merge function of the histogram structure has not been implemented yet")
if err != nil {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NDV should be merged here. You can add it when you merge NDV later.

@qw4990
Copy link
Contributor

qw4990 commented Feb 5, 2021

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 5, 2021
@qw4990
Copy link
Contributor

qw4990 commented Feb 5, 2021

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 22705

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@Reminiscent merge failed.

@Reminiscent
Copy link
Contributor Author

/run-unit-test

@qw4990
Copy link
Contributor

qw4990 commented Feb 5, 2021

/run-all-tests

1 similar comment
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit 2d717c1 into pingcap:master Feb 5, 2021
@Reminiscent Reminiscent deleted the buildGlobalStats-1 branch August 5, 2021 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants