Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

lightning: fix parquet parser for decimal type #1272

Merged
merged 5 commits into from
Jun 24, 2021

Conversation

glorv
Copy link
Collaborator

@glorv glorv commented Jun 23, 2021

What problem does this PR solve?

Fix parse parquet file when the corresponding type is decimal with FIXED_LEN_BYTE_ARRAY or BINARY

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

Related changes

Release note

  • Fix parquet parse when parse decimal type

Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

pkg/lightning/mydump/parquet_parser_test.go Show resolved Hide resolved
if dotIndex == 0 {
res.WriteByte('0')
} else {
res.Write([]byte(val[:dotIndex]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
res.Write([]byte(val[:dotIndex]))
res.WriteString(val[:dotIndex])

}
if scale > 0 {
res.WriteByte('.')
res.Write([]byte(val[dotIndex:]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
res.Write([]byte(val[dotIndex:]))
res.WriteString(val[dotIndex:])

Copy link
Contributor

@sleepymole sleepymole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

sec = v / 1e9
nsec = v % 1e9
}
// TODO: how to deal with TimeZone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there more we need to do about time zones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time/timestamp value read from parquet is either UTC or Local timezone (based on the IsAdjustedToUTC setting). Since currently lightning don't support set timezone (always set to target cluster's timezone). Not sure how should we deal with the unknown timezone offset if IsAdjustedToUTC = false (at least the default is true). @kennytm PTAL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the produced string 2006-01-02 15:04:05.999999Z when isAdjustedToUTC is true, and 2006-01-02 15:04:05.999999 when it is false.

(why do we only support 3 fractional digits instead of 6?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lonng
Copy link
Contributor

lonng commented Jun 24, 2021

@glorv It's better file an issue to record this bug.

@ti-chi-bot
Copy link
Member

@gozssky: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot added the status/LGT1 LGTM1 label Jun 24, 2021
@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • 3pointer
  • kennytm

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added status/LGT2 LGTM2 and removed status/LGT1 LGTM1 labels Jun 24, 2021
@glorv
Copy link
Collaborator Author

glorv commented Jun 24, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: f871acc

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #1275.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #1276.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #1277.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants