Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove telemetry from gasKV and cacheKV store operation's get methods #10072

Closed
4 tasks
ValarDragon opened this issue Sep 2, 2021 · 0 comments · Fixed by #10077
Closed
4 tasks

Remove telemetry from gasKV and cacheKV store operation's get methods #10072

ValarDragon opened this issue Sep 2, 2021 · 0 comments · Fixed by #10077
Labels
C: telemetry Issues and features pertaining to SDK telemetry. T: Performance Performance improvements

Comments

@ValarDragon
Copy link
Contributor

ValarDragon commented Sep 2, 2021

Summary

The current telemetry work in GasKV and CacheKV store get operations causes significant overheads, antithetical to their purpose.

These telemetry overheads are (seemingly) for the use case of find all I/O time. However the approach being used here does not make sense to me (its effectively measuring 'time spent in getting from cache', not 'time from underlying store'), and is actually causing the overhead that its measuring!

If these are viewed as necessary, then they should only happen if the node has telemetry enabled.

Problem Definition

In our osmosis epoch time benchmarks, where the CacheKV store's cache size is huge, at least a third of the CacheKV store's Get function is spent in telemetry related operations. However per my (back of the envelope) estimate, our hashmap get operation time should probably be at least double the time of normal CacheKV operations used in the SDK.

However this telemetry overhead is then paid again in the gas kv store that is calling the cache kv store. (As they both do the same telemetry operation on each get)

This means that most of the time in fetching data from a catch (even in the case of a miss) is spent in telemetry.

Proposal

I don't view time spent in caches as the helpful telemetry operation. I think telemetry for getting cache timings is also bad, as it ruins lots of hardware cache efficiency work going on here. If you wanted telemetry for your caches, you should be doing this at a higher level (e.g. overall time with and without the cache). imo you should not be benchmarking low level things like this, as it artificially screws up things like stack/registers, cache lines, which are important for data caches.

  • Remove the GasKVStore get telemetry operation
  • Remove the CacheKVStore get telemetry operation

The alternative take would be to 'gate' these defers to only be valid when telemetry is enabled, and make that propogate throughout the state machine. I think these telemetry ops should be removed personally.

pprof source timings for Osmosis epoch benchmarks' very heavy usage of cacheKVStore. (My understanding is that the runtime.NewObject comes from the defer? But I don't actually know that at all)

Screenshot 2021-09-02 at 3 06 48 PM

Screenshot 2021-09-02 at 3 50 42 PM


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ValarDragon ValarDragon added the T: Performance Performance improvements label Sep 2, 2021
@clevinson clevinson added C: telemetry Issues and features pertaining to SDK telemetry. backport/0.42.x (Stargate) labels Sep 3, 2021
@mergify mergify bot closed this as completed in #10077 Sep 5, 2021
mergify bot pushed a commit that referenced this issue Sep 5, 2021
## Description

Closes: #10072 

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if #10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed 
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
mergify bot pushed a commit that referenced this issue Sep 5, 2021
## Description

Closes: #10072

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if #10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 78b151d)

# Conflicts:
#	CHANGELOG.md
mergify bot pushed a commit that referenced this issue Sep 16, 2021
## Description

Closes: #10072

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if #10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 78b151d)

# Conflicts:
#	CHANGELOG.md
robert-zaremba added a commit that referenced this issue Sep 17, 2021
)

* perf: Remove telemetry from wrappings of store (#10077)

## Description

Closes: #10072

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if #10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 78b151d)

# Conflicts:
#	CHANGELOG.md

* fix changelog

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Robert Zaremba <robert@zaremba.ch>
evan-forbes pushed a commit to evan-forbes/cosmos-sdk that referenced this issue Oct 12, 2021
cosmos#10170)

* perf: Remove telemetry from wrappings of store (cosmos#10077)

## Description

Closes: cosmos#10072

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if cosmos#10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 78b151d)

# Conflicts:
#	CHANGELOG.md

* fix changelog

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Robert Zaremba <robert@zaremba.ch>
evan-forbes pushed a commit to evan-forbes/cosmos-sdk that referenced this issue Nov 1, 2021
cosmos#10170)

* perf: Remove telemetry from wrappings of store (cosmos#10077)

## Description

Closes: cosmos#10072

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if cosmos#10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 78b151d)

# Conflicts:
#	CHANGELOG.md

* fix changelog

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Robert Zaremba <robert@zaremba.ch>
JeancarloBarrios pushed a commit to agoric-labs/cosmos-sdk that referenced this issue Sep 28, 2024
cosmos#10170)

* perf: Remove telemetry from wrappings of store (cosmos#10077)

## Description

Closes: cosmos#10072

Significantly speeds up all the GasKvStore and CacheKVStore operations. Now Get/Set for these stores no longer even appears on my Osmosis benchmarks, saving ~8% of those benchmark's times. (Only one of the internal methods for setCacheValue appear -- will try to separately get a PR for reducing those memory allocations if cosmos#10026 is merged)

Talked to @alexanderbez about this on Discord, and he seemed in agreement with the approach of removing telemetry from these stores.

This does technically change telemetry, but I don't know if this is / should be considered a breaking change?

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [x] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [x] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification - is there something I should be updating?
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 78b151d)

# Conflicts:
#	CHANGELOG.md

* fix changelog

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Robert Zaremba <robert@zaremba.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: telemetry Issues and features pertaining to SDK telemetry. T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants