Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-3091: Add verification guide and .rat-excludes.txt for release #3101

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .rat-excludes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.gitignore
.rat-excludes.txt
PULL_REQUEST_TEMPLATE.md
strings-2.parquet$
nested_array.avsc$
map_with_nulls.avsc$
map.avsc$
list_with_nulls.avsc$
fixedToInt96.avsc$
array.avsc$
allFromParquetOldBehavior.avsc$
allFromParquetNewBehavior.avsc$
all.avsc$
stringBehavior.avsc$
logicalType.avsc$
58 changes: 58 additions & 0 deletions dev/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,61 @@ Merge hash: 485658a5
Would you like to pick 485658a5 into another branch? (y/n):
```
For now just say n as we have 1 branch

# Release Verification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the "release verification" should be moved to the parquet-site repo instead. Then, we even can have a link to this section in the VOTE email template.

I'm not sure why we need to check for the license headers separately in the tarball. It is already in the build process so we shall not have license header issues in the repo. What I usually do instead is comparing the content of the tarball with a freshly cloned repo set to the release RC tag. There should be no differences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to check for the license headers separately in the tarball. It is already in the build process so we shall not have license header issues in the repo.

I see, we can probably remove explicitly checking license headers. This is something that in general I've seen all projects do as part of their release verification process and something that I would say falls under the "verify that they meet all requirements of ASF policy on releases as described below" point on the ASF release guide. But it is true that as soon as there hasn't been any change as those are already done feels unnecessary.

How do you perform the comparison between the content of the tarball with a freshly cloned repo set to the release RC tag? Do we want to add that as a step?

I will move the PR to the parquet-site one, I might take a couple of days as I am slightly busy at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you perform the comparison between the content of the tarball with a freshly cloned repo set to the release RC tag? Do we want to add that as a step?

I use meld as diff tool but I don't think it should be added. Probably GNU diff can be configured to work on directory trees.

I will move the PR to the parquet-site one, I might take a couple of days as I am slightly busy at the moment.

I don't think we need to hurry. Please refer the parquet-site PR here so anyone call follow up.

Thanks a lot for working on this!

Copy link
Contributor

@Fokko Fokko Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the parquet-site is more suitable for these steps.

Regarding the license headers. It is part of the verification; having the rat check is just one way. All code must have an ASv2 license header. It would also be good to do manual checks when a new version is being released, as the RAT check might also miss something.

Thanks for working on this, this is really great 🙌


The Apache Arrow Release Approval process follows the guidelines defined at the
`Apache Software Foundation Release Approval <https://www.apache.org/legal/release-policy.html#release-approval>`_.

For a release vote to pass, a minimum of three positive binding votes and more
positive binding votes than negative binding votes MUST be cast.
Releases may not be vetoed. Votes cast by PMC members are binding, however,
non-binding votes are greatly encouraged and a sign of a healthy project.

In order to cast a vote individuals are expected to follow the following steps.

## Download source package, signature file, hash file and KEYS

The Release candidate will be present at `https://dist.apache.org/repos/dist/dev/parquet/`.
The RC folder will depend on the version and the release candidate id. See the following example files for
Apache Parquet 1.15.0 RC 1:
```
wget https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.15.0-rc1/apache-parquet-1.15.0.tar.gz
wget https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.15.0-rc1/apache-parquet-1.15.0.tar.gz.asc
wget https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.15.0-rc1/apache-parquet-1.15.0.tar.gz.sha512
wget https://dist.apache.org/repos/dist/release/parquet/KEYS
```

## Verify signature and hash

GnuPG is recommended, which can be install by:
- `yum install gnupg`, `apt-get install gnupg` on Linux based environments.
- `brew install gnupg` on macOS environments.


```
gpg --import KEYS
gpg --verify apache-parquet-1.15.0.tar.gz.asc apache-parquet-1.15.0.tar.gz
sha512sum --check apache-parquet-1.15.0.tar.gz.sha512
```

## Verify license header

Apache RAT is recommended to verify the license header, which can be dowload with the following command.

```
wget https://archive.apache.org/dist/creadur/apache-rat-0.16.1/apache-rat-0.16.1-bin.tar.gz
tar zxvf apache-rat-0.16.1-bin.tar.gz
```

You can check with the following command.
It will output a file list which doesn't include ASF license headers.
Please substitute `$PARQUET_SRC_FOLDER` with your `parquet-java` source folder from the following command.

```
java -jar apache-rat-0.16.1/apache-rat-0.16.1.jar -a -d apache-parquet-1.15.0.tar.gz -E $PARQUET_SRC_FOLDER/.rat-excludes.txt
```

## Verify building and tests

Check the [building section](../README.md#building)
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -498,6 +498,7 @@
<consoleOutput>true</consoleOutput>
<excludes>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we can change excludes for excludesFile (https://creadur.apache.org/rat/apache-rat-plugin/rat-mojo.html#excludesFile) but I am unsure why some of the regex on the individual excludes don't seem to work with excludesFile when I ran:
java -jar apache-rat-0.16.1/apache-rat-0.16.1.jar -a -d apache-parquet-1.15.0.tar.gz -E $PARQUET_SRC_FOLDER/.rat-excludes.txt
I'll investigate how to consolidate those two lists

<exclude>.github/PULL_REQUEST_TEMPLATE.md</exclude>
<exclude>.rat-excludes.txt</exclude>
<exclude>**/*.parquet</exclude>
<exclude>**/*.avro</exclude>
<exclude>**/*.json</exclude>
Expand Down
Loading