Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gnovm): limit value string size #2570

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

deelawn
Copy link
Contributor

@deelawn deelawn commented Jul 10, 2024

This closes #1736.

The idea is to limit the total number of characters that can be printed for each value when calling Machine.String(); the limit is currently 1024 per value. This should work for all value types that have the potential to generate a very long string -- strings, maps, slices, structs, recursive declared types, etc.

There will be at least follow up PR that puts a limit on the number of items that Machine.String() can print; items here refers to the number of values, blocks, statements, etc. The subsequent PR will cover both these cases:

  • infinite recursion
  • many items on the machine stack in case the cause isn't infinite recursion, if possible
Contributors' checklist...
  • Added new tests, or not needed, or not feasible
  • Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
  • Updated the official documentation or not needed
  • No breaking changes were made, or a BREAKING CHANGE: xxx message was included in the description
  • Added references to related issues and PRs
  • Provided any useful hints for running manual tests
  • Added new benchmarks to generated graphs, if any. More info here.

@github-actions github-actions bot added the 📦 🤖 gnovm Issues or PRs gnovm related label Jul 10, 2024
Copy link

codecov bot commented Jul 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.15%. Comparing base (25f1c92) to head (8230940).
Report is 171 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2570      +/-   ##
==========================================
- Coverage   60.15%   60.15%   -0.01%     
==========================================
  Files         561      561              
  Lines       74999    74901      -98     
==========================================
- Hits        45117    45053      -64     
+ Misses      26505    26471      -34     
  Partials     3377     3377              
Flag Coverage Δ
contribs/gnodev 61.40% <ø> (ø)
contribs/gnofaucet 14.46% <ø> (ø)
gno.land 64.75% <ø> (ø)
misc/genstd 80.54% <ø> (ø)
misc/logos 20.23% <ø> (ø)
tm2 61.95% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the 🧾 package/realm Tag used for new Realms or Packages. label Jul 15, 2024
@deelawn deelawn marked this pull request as ready for review July 16, 2024 20:52
@deelawn deelawn requested review from jaekwon, moul, piux2 and a team as code owners July 16, 2024 20:52
Copy link
Member

@thehowl thehowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few takes:

  • the limit should not apply if using println:
    • off-chain, it should just print the thing that the user wants to print
    • on-chain, printing say 16 bytes should count as 1 unit of gas, if anything. (we use the debug output on-chain for msgrun).
  • for Machine.String, it makes sense to use it always.

do you want to add berty's txtar in the tests?

@@ -43,7 +43,7 @@ func main() {
// > \- [@gnouser](/r/demo/users:gnouser), [2009-02-13 11:31pm (UTC)](/r/demo/boards:test_board/2/3) \[[reply](/r/demo/boards?help&__func=CreateReply&bid=1&threadid=2&postid=3&body.type=textarea)] \[[x](/r/demo/boards?help&__func=DeletePost&bid=1&threadid=2&postid=3)]
//
// > Second reply of the second post
// > \- [@gnouser](/r/demo/users:gnouser), [2009-02-13 11:31pm (UTC)](/r/demo/boards:test_board/2/4) \[[reply](/r/demo/boards?help&__func=CreateReply&bid=1&threadid=2&postid=4&body.type=textarea)] \[[x](/r/demo/boards?help&__func=DeletePost&bid=1&threadid=2&postid=4)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

println should be special-cased if it's just printing a string directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by special-cased?

Copy link
Member

@thehowl thehowl Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I wrote afterwards in the review comment - it's kind of unexpected that println doesn't print what you want if the value is too long.

@deelawn
Copy link
Contributor Author

deelawn commented Jul 30, 2024

a few takes:

  • the limit should not apply if using println:

    • off-chain, it should just print the thing that the user wants to print
    • on-chain, printing say 16 bytes should count as 1 unit of gas, if anything. (we use the debug output on-chain for msgrun).
  • for Machine.String, it makes sense to use it always.

do you want to add berty's txtar in the tests?

I think no limit for offchain printing makes sense, and this could be determined by inspecting one or more machine fields. However, on chain printing could result in some very large values -- imagine printing large avl trees or long slices of nested data structures -- so I'm not sure we should remove the limit. Maybe there can be an exception or increased limit when the machine writer originates from a run message. You make a good suggestion that println should consume gas and it might be a good trade off -- make println consume gas and have a much higher limit than Machine.String() would.

At least so far, running gno.land locally, it can take up to a minute to print out arbitrary values as strings depending on how much data is being written. The machine is in an execution state for this entire duration, meaning no other transactions can be processed until the current printing is finished. Once a realm has some large data structure, or another realm has a large exported data structure, it is currently relatively inexpensive to start a loop and print that data structure repeatedly. I was able to achieve this using Berty's test case to build a large tree, then call a function to loop and print, first appending a few elements in the tree to trigger a rebalancing so not every element is a refvalue.

I'll write some more concrete test cases and commit them here to exhibit some of the behavior I've described above.

@thehowl
Copy link
Member

thehowl commented Jul 30, 2024

At least so far, running gno.land locally, it can take up to a minute to print out arbitrary values as strings depending on how much data is being written. The machine is in an execution state for this entire duration, meaning no other transactions can be processed until the current printing is finished. Once a realm has some large data structure, or another realm has a large exported data structure, it is currently relatively inexpensive to start a loop and print that data structure repeatedly. I was able to achieve this using Berty's test case to build a large tree, then call a function to loop and print, first appending a few elements in the tree to trigger a rebalancing so not every element is a refvalue.

+1

Another idea, for the on-chain case, is to have a buffer for the debug result which panics if it goes beyond a certain amount, say 10M. (And maybe have this with a lower limit on qeval).

I'd say that generating 10MB of a deeply nested value would take very little time, freeing up the machine after a few milliseconds.

The benefit to having a limit, either of a fixed amount of bytes or gas-based, is that a user cannot work this limitation around by putting the print of a large value in a for loop (which would still hog the machine's resources).

@deelawn
Copy link
Contributor Author

deelawn commented Aug 1, 2024

Okay, so then here is what I propose:

  1. For bytes printed using print or println
    1. 10MB character limit (does not apply to VM running off-chain)
    2. Consume 1 unit of gas per byte
  2. For bytes printed via Machine.String() after a panic
    1. Limit machine value strings to (1MB / num_machine_values) for each value
    2. No gas will be charged for this
    3. The limit is smaller because the machine prints a lot of other data besides its values
    4. This PR will follow to limit the overall amount that the machine can print from its String method: feat: limit size of the machine string #2385

@thehowl any thoughts? Let me know specifically if there is something to be adjusted. Otherwise it can always be adjusted later. I'll also seek feedback from other core team members before I begin the modifications.

@thehowl
Copy link
Member

thehowl commented Aug 2, 2024

Sounds great!

One consideration: how does this interact with printing values as a result of maketx call? IIRC they still use the same functions as the one being changed here.

The behaviour should probably be consistent with println. I think if we add a 10MB limit to the output it's fine already (without needing to add the gas); after all, if we also have gas then the 10MB limit can never be reached (as max gas is always < 10M).

@ltzmaxwell
Copy link
Contributor

  1. Limit machine value strings to (1MB / num_machine_values) for each value
  2. No gas will be charged for this

When dealing with on-chain operations, is it necessary to dump the entire machine state using m.String()? It seems that m.Stacktrace() might be sufficient for most purposes, especially since m.String() is typically used for in-depth VM debugging(off-chain).

@deelawn deelawn marked this pull request as draft August 16, 2024 14:52
@deelawn
Copy link
Contributor Author

deelawn commented Aug 16, 2024

Moved to draft; I will get back to this.

@thehowl thehowl self-assigned this Oct 15, 2024
Copy link

This PR is stale because it has been open 3 months with no activity. Remove stale label or comment or this will be closed in 3 months.

@github-actions github-actions bot added Stale and removed Stale labels Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 🤖 gnovm Issues or PRs gnovm related 🧾 package/realm Tag used for new Realms or Packages.
Projects
Status: No status
Status: Backlog
Development

Successfully merging this pull request may close these issues.

3 participants