Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

storage: S3 test refactoring and fix some bugs #548

Merged
merged 6 commits into from
Oct 9, 2020

Conversation

kennytm
Copy link
Collaborator

@kennytm kennytm commented Oct 5, 2020

What problem does this PR solve?

What is changed and how it works?

  1. Upgrade fake-gcs-server from 1.17.0 to 1.19.0 to include Support Windows fsouza/fake-gcs-server#228 (note: the latest version is 1.21.1 but that upgrades the gRPC dependency to 1.31.1 which maybe too new for other dependencies such as etcd and the TiDB cluster. we choose 1.19.0 to avoid upgrading the deps too much).

  2. Removed the storage.s3Handlers type, because s3iface.S3API exists. We also removed the mockS3Handler type in s3_test.go in favor of a real mocking framework.

  3. Naively using step 2 caused a dependency cycle between pkg/mock and pkg/storage. Therefore we moved s3_test.go into its own package storage_test, which leads to some privacy problems. We have performed the following changes:

    • RangeInfo is now public.
    • The TestS3Others test, relying on defineS3Flags but performs no tests, is removed.
    • The checkS3 hack is removed, in favor of options passed to the ExternalStorage constructor (see below).
  4. Responding to the change in step 3, we generalized the storage.Create function to accept an ExternalStorageOptions struct replacing the sendCreds boolean. The options are:

    • SendCredentials — same as the old sendCreds boolean
    • SkipCheckPath — same as enabling the checkS3 hack, and generalized to Local and GCS
    • HTTPClient — used to unify the newGCSStorage and newGCSStorageWithHTTPClient functions, and generalized to S3.
  5. While running the tests, the following bugs are also fixed in this PR:

    • S3Storage.FileExists was returning true, nil if the HeadObject requests returned any non-AWS errors. This is now changed to return false, err.
    • s3StorageReader.Read returns 0, nil when reaching EOF, because we used io.ReadFull rather than (io.Reader).Read (dunno why). This causes io.ErrNoProgress when wrapping a bufio.Reader on the reader, which affects Lightning's S3 data source. Changed to the latter so it properly return 0, io.EOF.
    • WalkDir previously is implemented using the ListObjects API, using the NextMarker field as continuation token. Unfortunately, this field is missing on AWS S3 when Delimiter is empty (it is always populated on minio and Aliyun OSS), so when there are over 1000 objects, WalkDir will enter an infinite loop! Here we change the Marker calculation by using the last Key in the Contents instead, as recommended in the docs.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change
    • storage.Create is soft-deprecated and storage.NewS3Storage is hard-deprecated. They can still be used but are discouraged. In particular TiCDC should stop using storage.NewS3Storage, it makes no sense to not use storage.ParseBackend + storage.New.

Side effects

Related changes

  • Need to cherry-pick to the release branch

Release Note

  • (Lightning, not relevant to BR) Fixed several errors when using AWS S3 as data source, including: (1) not supporting reading directory with over 1000 objects; (2) when reading schemas it always fail with "failed to extract table schema" with error "multiple Read calls return no data or error".

this requires s3_test to be placed into a separate storage_test package
to avoid circular dependency.

this makes the `checkS3` hack not usable. instead, we introduced a new
option struct to the constructor to disable to the check.
@kennytm
Copy link
Collaborator Author

kennytm commented Oct 5, 2020

The io.ReadFull was introduced in 7581dac and 3554ada because

s3 api may not return enough data, so we need to loop fetch enough

...

TODO: currently, if read to the end, s3 will return io.EOF

Previously it was just Read in #446.

However, according to the interface comment of Reader.Read, this is exactly the intended behavior:

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered.

...

Implementations of Read are discouraged from returning a zero byte count with a nil error, except when len(p) == 0. Callers should treat a return of 0 and nil as indicating that nothing happened; in particular it does not indicate EOF.

So @glorv do you have the actual error which leads to the Read loop in the first place, so we could add it to the test case?

@tirsen
Copy link
Contributor

tirsen commented Oct 5, 2020

I can confirm that this fixes our issue.

@kennytm kennytm changed the title storage: S3 test refactoring and some fix some minor bugs storage: S3 test refactoring and fix some minor bugs Oct 5, 2020
@kennytm kennytm changed the title storage: S3 test refactoring and fix some minor bugs storage: S3 test refactoring and fix some bugs Oct 5, 2020
@kennytm kennytm requested a review from glorv October 5, 2020 20:00
@overvenus
Copy link
Member

LGTM

@ti-srebot ti-srebot added the status/LGT1 LGTM1 label Oct 9, 2020
@glorv
Copy link
Collaborator

glorv commented Oct 9, 2020

The io.ReadFull was introduced in 7581dac and 3554ada because

s3 api may not return enough data, so we need to loop fetch enough
...
TODO: currently, if read to the end, s3 will return io.EOF

Previously it was just Read in #446.

However, according to the interface comment of Reader.Read, this is exactly the intended behavior:

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered.
...
Implementations of Read are discouraged from returning a zero byte count with a nil error, except when len(p) == 0. Callers should treat a return of 0 and nil as indicating that nothing happened; in particular it does not indicate EOF.

So @glorv do you have the actual error which leads to the Read loop in the first place, so we could add it to the test case?

The root cause is that some code in parquet-go use reader.Read instead of io.ReadFull to read into a buffer but does not check if the buffer is fully filled. This should be a bug, I filed a pr to fix this bug.

@glorv
Copy link
Collaborator

glorv commented Oct 9, 2020

LGTM

@ti-srebot ti-srebot added status/LGT2 LGTM2 and removed status/LGT1 LGTM1 labels Oct 9, 2020
@kennytm kennytm merged commit 9066e09 into pingcap:master Oct 9, 2020
@kennytm kennytm deleted the s3iface branch October 9, 2020 05:49
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 failed

coocood added a commit to coocood/br that referenced this pull request Oct 14, 2020
3pointer pushed a commit to 3pointer/br that referenced this pull request Oct 16, 2020
* storage: replace s3Handlers by the standard s3iface.S3API interface

* storage: publish RangeInfo for testing

* storage: use gomock instead of mockS3Handlers for proper mocking

this requires s3_test to be placed into a separate storage_test package
to avoid circular dependency.

this makes the `checkS3` hack not usable. instead, we introduced a new
option struct to the constructor to disable to the check.

* storage/s3: fix FileExists always `return true, nil` on non-AWS error

* storage/s3: do not use ReadFull to implement Read

* storage/s3: fix incorrect usage of NextMarker in WalkDir
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants