-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: parse chain-id from big genesis file could be slow #18204
Conversation
Solution: - use streaming json parsing for that
WalkthroughThis pull request introduces a new feature to parse the chain ID from a genesis JSON file using a streaming JSON parser. The changes include a new function Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to filter (2)
- go.mod
- go.sum
Files selected for processing (4)
- CHANGELOG.md (1 hunks)
- server/util.go (2 hunks)
- types/chain_id.go (1 hunks)
- types/chain_id_test.go (1 hunks)
Files skipped from review due to trivial changes (2)
- CHANGELOG.md
- types/chain_id_test.go
Additional comments (Suppressed): 2
server/util.go (2)
42-42: The import
genutiltypes "github.com/cosmos/cosmos-sdk/x/genutil/types"
has been removed. Ensure that this does not affect other parts of the code that might be using it.487-496: The new code introduces a more efficient way of parsing the chain ID from the genesis file. It uses a streaming JSON parsing approach instead of loading the entire file into memory. This is a good performance improvement, especially for large genesis files. However, the error handling could be improved. Instead of using
panic
, consider returning the error to the caller function. This would allow the caller to decide how to handle the error and would make the code more robust and easier to debug.- panic(err) + return nil, err- panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err)) + return nil, fmt.Errorf("failed to parse chain-id from genesis file: %w", err)
types/chain_id.go
Outdated
decoder := jstream.NewDecoder(reader, 1).EmitKV() | ||
var chain_id string | ||
err := decoder.Decode(func(mv *jstream.MetaValue) bool { | ||
if kv, ok := mv.Value.(jstream.KV); ok { | ||
if kv.Key == ChainIDFieldName { | ||
chain_id, _ = kv.Value.(string) | ||
return false | ||
} | ||
} | ||
return true | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function ParseChainIDFromGenesis
uses the jstream
library to parse the JSON genesis file. It's a good approach to use a streaming JSON parser for large files as it uses constant memory. However, there's a potential issue with error handling. In line 19, the error from the type assertion kv.Value.(string)
is ignored. If kv.Value
is not a string, this will not cause a panic due to the use of the "comma ok" idiom, but chain_id
will be an empty string and no error will be reported at this stage. This could lead to misleading errors, as the function will later return an error stating that the "chain-id was not found in the genesis file", even though it was found but was not a string. Consider handling this error to provide more accurate feedback.
- chain_id, _ = kv.Value.(string)
+ chain_id, ok = kv.Value.(string)
+ if !ok {
+ return errors.New("chain-id is not a string")
+ }
types/chain_id.go
Outdated
if len(chain_id) > 0 { | ||
return chain_id, nil | ||
} | ||
if err == nil { | ||
return "", errors.New("chain-id not found in genesis file") | ||
} | ||
return "", err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling here could be improved for clarity. If chain_id
is empty and err
is not nil
, the function will return an error stating that the "chain-id was not found in the genesis file", even though the error might be unrelated to the presence of the chain-id. Consider checking err
first and returning it if it's not nil
, before checking if chain_id
is empty.
- if len(chain_id) > 0 {
- return chain_id, nil
- }
- if err == nil {
- return "", errors.New("chain-id not found in genesis file")
- }
- return "", err
+ if err != nil {
+ return "", err
+ }
+ if len(chain_id) == 0 {
+ return "", errors.New("chain-id not found in genesis file")
+ }
+ return chain_id, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- types/chain_id.go (1 hunks)
Additional comments (Suppressed): 2
types/chain_id.go (2)
- 16-26: The error handling issue identified in the previous review has not been addressed. If
kv.Value
is not a string,chain_id
will be an empty string and no error will be reported at this stage. This could lead to misleading errors, as the function will later return an error stating that the "chain-id was not found in the genesis file", even though it was found but was not a string. Consider handling this error to provide more accurate feedback.- chain_id, _ = kv.Value.(string) + chain_id, ok = kv.Value.(string) + if !ok { + return errors.New("chain-id is not a string") + }
- 27-33: The error handling here could be improved for clarity. If
chain_id
is empty anderr
is notnil
, the function will return an error stating that the "chain-id was not found in the genesis file", even though the error might be unrelated to the presence of the chain-id. Consider checkingerr
first and returning it if it's notnil
, before checking ifchain_id
is empty.- if len(chain_id) > 0 { - return chain_id, nil - } - if err == nil { - return "", errors.New("chain-id not found in genesis file") - } - return "", err + if err != nil { + return "", err + } + if len(chain_id) == 0 { + return "", errors.New("chain-id not found in genesis file") + } + return chain_id, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- types/chain_id.go (1 hunks)
Additional comments (Suppressed): 4
types/chain_id.go (4)
16-40: The function
ParseChainIDFromGenesis
uses a streaming JSON parser to extract the chain-id from the genesis file. This approach is efficient for large JSON files as it uses constant memory. However, the function assumes that the chain-id is a string. If the chain-id is not a string or not found in the genesis file, an error is returned. This could potentially break the application if the genesis file structure changes or if the chain-id is not a string. Therefore, it's important to ensure that the genesis file always contains a string chain-id and that the application can handle the error cases appropriately.16-40: The function
ParseChainIDFromGenesis
uses thejstream
library to parse the JSON stream. While this library is efficient for large JSON files, it's important to ensure that it's maintained and reliable. If the library is not well-maintained, it could introduce bugs or security vulnerabilities into the application. Therefore, it's recommended to evaluate the library's reliability and maintenance status before using it.16-40: The function
ParseChainIDFromGenesis
uses a callback function in theDecode
method to process the JSON stream. This approach is efficient as it allows the function to stop processing as soon as the chain-id is found. However, the callback function returns a boolean value to indicate whether to continue processing the stream. This could be confusing as it's not immediately clear what the return value represents. It's recommended to add a comment explaining the purpose of the return value to improve code readability.err := decoder.Decode(func(mv *jstream.MetaValue) bool { if kv, ok := mv.Value.(jstream.KV); ok { if kv.Key == ChainIDFieldName { chain_id, ok = kv.Value.(string) // Return false to stop processing the stream return false } } // Return true to continue processing the stream return true })
- 33-39: The error handling in the function
ParseChainIDFromGenesis
could be improved. Currently, the function checks if the chain-id is a string and if it's found in the genesis file. If not, it returns an error. However, these checks are done after theDecode
method is called. If theDecode
method returns an error, these checks are unnecessary. It's recommended to check for errors immediately after theDecode
method is called to improve performance and code readability.err := decoder.Decode(func(mv *jstream.MetaValue) bool { if kv, ok := mv.Value.(jstream.KV); ok { if kv.Key == ChainIDFieldName { chain_id, ok = kv.Value.(string) return false } } return true }) + if err != nil { + return "", err + } if len(chain_id) > 0 { return chain_id, nil } if !ok { return "", errors.New("chain-id is not a string") } - if err == nil { - return "", errors.New("chain-id not found in genesis file") - } - return "", err + return "", errors.New("chain-id not found in genesis file")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- types/chain_id.go (1 hunks)
Additional comments (Suppressed): 4
types/chain_id.go (4)
16-16: The
jstream.NewDecoder(reader, 1).EmitKV()
is used to create a new decoder with a buffer size of 1. This is a good practice for memory management, especially when dealing with large files. However, it's worth noting that a smaller buffer size might lead to more frequent I/O operations, which could potentially impact performance. If performance becomes an issue, consider increasing the buffer size.22-26: The code checks if the current value is a key-value pair and if the key is
ChainIDFieldName
. If both conditions are met, it attempts to cast the value to a string and stops the decoding. This is a good practice as it stops the decoding as soon as the required data is found, saving unnecessary computation. However, it's worth noting that if thechain_id
is not found, the decoding will continue until the end of the file, which could be time-consuming for large files.30-39: The error handling here is comprehensive and provides clear error messages for different scenarios: when the
chain_id
is not a string, when it's not found in the genesis file, and when there's an error during decoding. This is a good practice as it helps with debugging and error tracking.14-14: The TODO comment suggests considering the use of
encoding/json/v2
when it's ready. This is a good practice as it shows forward-thinking and planning for future improvements. However, it's important to keep track of these TODO comments and address them in a timely manner to avoid accumulating technical debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good catch!
Should we wait for upstream to merge your changes or you are going to maintain this fork? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work and thanks @yihuang for the improvement to use streaming JSON. I've added some suggestions, please take a look
types/chain_id_test.go
Outdated
|
||
func BenchmarkParseChainID(b *testing.B) { | ||
// an arbitrary genesis file from a cronos devnet. | ||
json := ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please commit this JSON to a file under a directory "testdata" so example "testdata/streaming_chain_id.json" and then read it in by file then kindly run b.ResetTimer() even before each sub-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
types/chain_id_test.go
Outdated
b.Run("new", func(b *testing.B) { | ||
b.ResetTimer() | ||
for i := 0; i < b.N; i++ { | ||
_, err := ParseChainIDFromGenesis(strings.NewReader(json)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assert that the parsed chainID is not empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
types/chain_id_test.go
Outdated
"vesting": {} | ||
} | ||
} | ||
` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add: b.ReportAllocs()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
types/chain_id_test.go
Outdated
package types | ||
|
||
import ( | ||
strings "strings" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- types/testdata/streaming_chain_id.json
Files selected for processing (2)
- types/chain_id.go (1 hunks)
- types/chain_id_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
- types/chain_id_test.go
Additional comments: 1
types/chain_id.go (1)
- 14-57: The function
ParseChainIDFromGenesis
is well implemented. It uses a streaming JSON parsing approach which is more efficient in terms of memory usage and speed when dealing with large JSON files. The function also handles errors properly and validates thechain_id
before returning it. However, the function's comment mentions that the user must ensure thechain_id
is put beforeapp_state
or other big entries to enjoy the efficiency. This could be a potential issue if the user is not aware of this requirement. It would be better to handle this within the function itself or at least make it clear in the function's documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- types/chain_id.go (1 hunks)
Additional comments: 3
types/chain_id.go (3)
15-19: The function comment has been updated to include the information that the function will return an error if the
chain_id
field is not found in the JSON file. This is a good improvement as it provides more clarity on the function's behavior.20-60: The function
ParseChainIDFromGenesis
is well implemented. It uses a streaming JSON parser to parse the JSON file until thechain_id
field is found, then aborts early. This approach is efficient, especially for large files. The function also handles errors properly and validates thechain_id
using thevalidateChainID
function.62-71: The function
validateChainID
checks if thechain_id
is not empty and if its length does not exceed the maximum allowed length. This is a good practice to ensure the validity of thechain_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- CHANGELOG.md (1 hunks)
Additional comments: 1
CHANGELOG.md (1)
- 70-70: The change log entry correctly summarizes the changes made in the PR. It provides a clear and concise description of the change and links to the relevant PR for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Can you fix linting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- types/chain_id_test.go (1 hunks)
Additional comments: 4
types/chain_id_test.go (4)
1-11: The imports look fine and are relevant to the tests being performed. The use of the
embed
package to include a JSON file for benchmarking is a good practice.13-14: The
go:embed
directive is used correctly to include theparse_chain_id.json
file for benchmarking. The variableBenchmarkGenesis
is appropriately named and will hold the contents of the JSON file.16-108: The
TestParseChainIDFromGenesis
function is well-structured and covers a wide range of test cases, including success, nested, non-existent, invalid type, invalid JSON, empty chain_id, whitespace chain_id, and chain_id too long. The use of table-driven tests is a good practice in Go, and the test cases are comprehensive and cover all possible edge cases. The error messages are checked to ensure they match the expected error messages, which is a good practice for testing error handling.110-130: The
BenchmarkParseChainID
function is well-structured and benchmarks the newParseChainIDFromGenesis
function against the oldAppGenesisFromReader
function. The use ofb.ReportAllocs()
is a good practice to measure memory allocation. Theb.ResetTimer()
function is used correctly to exclude setup time from the benchmark. The benchmark function is correctly structured to run the functionb.N
times, which is the standard way to write benchmarks in Go.
You already made the benchmark to compare both in the /old and /new. You'll just need to paste the contents of /old into old.txt then those of /new into new.txt, delete the differences in names then run benchstat old.txt bench.txt and paste the results into your commit message and this PR at the top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- types/chain_id_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
- types/chain_id_test.go
sounds good, updated PR description, thanks. |
types/chain_id.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe move this to genutil? Imho this makes more sense to have it there as its genesis related than in types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- x/genutil/types/testdata/parse_chain_id.json
Files selected for processing (3)
- server/util.go (1 hunks)
- x/genutil/types/chain_id.go (1 hunks)
- x/genutil/types/chain_id_test.go (1 hunks)
Additional comments: 4
server/util.go (1)
- 488-497: The new code introduces a more efficient way of parsing the
chain_id
from the genesis file. Instead of loading the entire file into memory, it uses a streaming JSON parser to parse thechain_id
and aborts early. This approach significantly reduces memory usage and processing time. However, it's important to note that this approach assumes that thechain_id
field is placed before any large entries in the JSON file for maximum efficiency.While the use of
panic
is acceptable in this context as it's used to halt the program execution when the genesis file cannot be opened or parsed, it's generally recommended to handle errors gracefully and provide informative error messages to the user. In this case, the error messages are informative and provide context about the error.- appGenesis, err := genutiltypes.AppGenesisFromFile(filepath.Join(homeDir, "config", "genesis.json")) - if err != nil { - panic(err) - } - chainID = appGenesis.ChainID + reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json")) + if err != nil { + panic(err) + } + defer reader.Close() + chainID, err = genutiltypes.ParseChainIDFromGenesis(reader) + if err != nil { + panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err)) + }x/genutil/types/chain_id.go (2)
21-60: The function
ParseChainIDFromGenesis
is well-written and handles errors properly. It uses a streaming JSON parser to parse thechain_id
and aborts early, which is a good practice for performance optimization. The function also validates thechain_id
using thevalidateChainID
function, which is a good practice for data validation.62-71: The function
validateChainID
is well-written and handles errors properly. It checks if thechain_id
is empty or exceeds the maximum length, which are good practices for data validation.x/genutil/types/chain_id_test.go (1)
- 110-130: The benchmark tests for
ParseChainIDFromGenesis
andAppGenesisFromReader
are well-structured. They measure the time and memory allocations for each function, which is useful for comparing their performance. However, theBenchmarkGenesis
string is not shown in this hunk, so it's unclear what kind of data is being used for the benchmark. It would be helpful to include a comment describing the structure and size of theBenchmarkGenesis
string for context.Here's a suggestion to improve the error message checking in the test cases:
- require.Contains(t, err.Error(), tc.expError) + require.Equal(t, tc.expError, err.Error())Committable suggestion (Beta)
func TestParseChainIDFromGenesis(t *testing.T) { testCases := []struct { name string json string expChainID string expError string }{ { "success", `{ "state": { "accounts": { "a": {} } }, "chain_id": "test-chain-id" }`, "test-chain-id", "", }, { "nested", `{ "state": { "accounts": { "a": {} }, "chain_id": "test-chain-id" } }`, "", "missing chain-id in genesis file", }, { "not exist", `{ "state": { "accounts": { "a": {} } }, "chain-id": "test-chain-id" }`, "", "missing chain-id in genesis file", }, { "invalid type", `{ "chain-id": 1, }`, "", "invalid character '}' looking for beginning of object key string", }, { "invalid json", `[ " ': }`, "", "expected {, got [", }, { "empty chain_id", `{"chain_id": ""}`, "", "genesis doc must include non-empty chain_id", }, { "whitespace chain_id", `{"chain_id": " "}`, "", "genesis doc must include non-empty chain_id", }, { "chain_id too long", `{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`, "", "chain_id in genesis doc is too long", }, } for _, tc := range testCases { t.Run(tc.name, func(t *testing.T) { chain_id, err := types.ParseChainIDFromGenesis(strings.NewReader(tc.json)) if tc.expChainID == "" { require.Error(t, err) require.Equal(t, tc.expError, err.Error()) } else { require.NoError(t, err) require.Equal(t, tc.expChainID, chain_id) } }) } }
func TestParseChainIDFromGenesis(t *testing.T) { | ||
testCases := []struct { | ||
name string | ||
json string | ||
expChainID string | ||
expError string | ||
}{ | ||
{ | ||
"success", | ||
`{ | ||
"state": { | ||
"accounts": { | ||
"a": {} | ||
} | ||
}, | ||
"chain_id": "test-chain-id" | ||
}`, | ||
"test-chain-id", | ||
"", | ||
}, | ||
{ | ||
"nested", | ||
`{ | ||
"state": { | ||
"accounts": { | ||
"a": {} | ||
}, | ||
"chain_id": "test-chain-id" | ||
} | ||
}`, | ||
"", | ||
"missing chain-id in genesis file", | ||
}, | ||
{ | ||
"not exist", | ||
`{ | ||
"state": { | ||
"accounts": { | ||
"a": {} | ||
} | ||
}, | ||
"chain-id": "test-chain-id" | ||
}`, | ||
"", | ||
"missing chain-id in genesis file", | ||
}, | ||
{ | ||
"invalid type", | ||
`{ | ||
"chain-id": 1, | ||
}`, | ||
"", | ||
"invalid character '}' looking for beginning of object key string", | ||
}, | ||
{ | ||
"invalid json", | ||
`[ " ': }`, | ||
"", | ||
"expected {, got [", | ||
}, | ||
{ | ||
"empty chain_id", | ||
`{"chain_id": ""}`, | ||
"", | ||
"genesis doc must include non-empty chain_id", | ||
}, | ||
{ | ||
"whitespace chain_id", | ||
`{"chain_id": " "}`, | ||
"", | ||
"genesis doc must include non-empty chain_id", | ||
}, | ||
{ | ||
"chain_id too long", | ||
`{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`, | ||
"", | ||
"chain_id in genesis doc is too long", | ||
}, | ||
} | ||
|
||
for _, tc := range testCases { | ||
t.Run(tc.name, func(t *testing.T) { | ||
chain_id, err := types.ParseChainIDFromGenesis(strings.NewReader(tc.json)) | ||
if tc.expChainID == "" { | ||
require.Error(t, err) | ||
require.Contains(t, err.Error(), tc.expError) | ||
} else { | ||
require.NoError(t, err) | ||
require.Equal(t, tc.expChainID, chain_id) | ||
} | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases for TestParseChainIDFromGenesis
are well-structured and cover a variety of scenarios, including successful parsing, missing chain_id
, invalid JSON, and chain_id
that is too long, empty, or contains only whitespace. This is a good practice as it ensures that the function behaves as expected in different situations. However, there is a minor issue with the error message checking. The test cases are checking for a substring of the error message, which might lead to false positives if the error messages are changed in the future. It would be better to compare the entire error message to ensure accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- CHANGELOG.md (1 hunks)
Additional comments: 1
CHANGELOG.md (1)
- 70-70: The changelog entry correctly summarizes the changes made in the PR. It mentions the use of a streaming JSON parser to parse the
chain_id
from the genesis file, which is the main change in this PR.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (3)
- CHANGELOG.md (1} hunks)
- server/util.go (1} hunks)
- x/genutil/types/chain_id.go (1} hunks)
Additional comments: 5
CHANGELOG.md (1)
- 70-70: The change log entry correctly summarizes the changes made in the PR. It mentions the use of a streaming JSON parser to parse the
chain_id
from the genesis file, which aligns with the PR summary.x/genutil/types/chain_id.go (4)
1-11: The import statements look fine. No unused imports detected.
13-14: The constant
ChainIDFieldName
is well defined and can be reused in other parts of the codebase.15-58: The function
ParseChainIDFromGenesis
is well implemented. It uses a streaming JSON parser to parse the genesis JSON file until thechain_id
field is found, which is a good approach for handling large files. The function also handles errors appropriately and validates thechain_id
using thevalidateChainID
function.60-69: The function
validateChainID
is well implemented. It checks if thechain_id
is not empty and if its length does not exceed the maximum allowed length. The function also handles errors appropriately.
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json")) | ||
if err != nil { | ||
panic(err) | ||
} | ||
defer reader.Close() | ||
|
||
chainID = appGenesis.ChainID | ||
chainID, err = genutiltypes.ParseChainIDFromGenesis(reader) | ||
if err != nil { | ||
panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code introduces a more efficient way of parsing the chain_id
from the genesis JSON file. Instead of loading the entire file into memory, it uses a streaming JSON parser to parse the file until the chain_id
field is found. This approach significantly reduces memory usage and processing time. However, the use of panic
for error handling is not recommended. It would be better to return the error to the caller and let it decide how to handle the error. This would make the code more robust and easier to debug.
- panic(err)
+ return "", fmt.Errorf("failed to open genesis file: %w", err)
- panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+ return "", fmt.Errorf("failed to parse chain-id from genesis file: %w", err)
Committable suggestion (Beta)
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json")) | |
if err != nil { | |
panic(err) | |
} | |
defer reader.Close() | |
chainID = appGenesis.ChainID | |
chainID, err = genutiltypes.ParseChainIDFromGenesis(reader) | |
if err != nil { | |
panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err)) | |
} | |
chainID := cast.ToString(appOpts.Get(flags.FlagChainID)) | |
if chainID == "" { | |
// fallback to genesis chain-id | |
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json")) | |
if err != nil { | |
return "", fmt.Errorf("failed to open genesis file: %w", err) | |
} | |
defer reader.Close() | |
chainID, err = genutiltypes.ParseChainIDFromGenesis(reader) | |
if err != nil { | |
return "", fmt.Errorf("failed to parse chain-id from genesis file: %w", err) | |
} | |
} | |
snapshotStore, err := GetSnapshotStore(appOpts) |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Julien Robert <julien@rbrt.fr> (cherry picked from commit 793ca9a) # Conflicts: # CHANGELOG.md
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Julien Robert <julien@rbrt.fr>
Description
Currently, to parse the
chain_id
from genesis file, the whole json file is parsed into memory, sometimes theapp_state
is huge which makes the operation very costly, the solution here is to parse until thechain_id
field and abort early, so as long as thechain_id
field is put on the top of the genesis json file, it's much more efficient.Benchmark result is on a small typical genesis file, on a big file, the difference should be huge.
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
make lint
andmake test
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking changeSummary by CodeRabbit
These changes aim to improve the system's performance and reliability, providing a more user-friendly experience.