Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat][reader][parquet] - Updated Apache Arrow library from v11 to v12.0.1 #35640

Merged
merged 16 commits into from
Jun 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- 'add_cloud_metadata' processor - add cloud.region field for GCE cloud provider
- 'add_cloud_metadata' processor - update azure metadata api version to get missing `cloud.account.id` field
- Make sure k8s watchers are closed when closing k8s meta processor. {pull}35630[35630]

- Upgraded apache arrow library used in x-pack/libbeat/reader/parquet from v11 to v12.0.1 in order to fix cross-compilation issues {pull}35640[35640]

*Auditbeat*

Expand Down
85 changes: 7 additions & 78 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2757,12 +2757,12 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
Dependency : github.com/apache/arrow/go/v11
Version: v11.0.0
Dependency : github.com/apache/arrow/go/v12
Version: v12.0.1-0.20230605094802-c153c6d36ccf
Licence type (autodetected): Apache-2.0
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/apache/arrow/go/v11@v11.0.0/LICENSE.txt:
Contents of probable licence file $GOMODCACHE/github.com/apache/arrow/go/v12@v12.0.1-0.20230605094802-c153c6d36ccf/LICENSE.txt:


Apache License
Expand Down Expand Up @@ -2969,77 +2969,6 @@ Contents of probable licence file $GOMODCACHE/github.com/apache/arrow/go/v11@v11

--------------------------------------------------------------------------------

src/plasma/fling.cc and src/plasma/fling.h: Apache 2.0

Copyright 2013 Sharvil Nanavati

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

--------------------------------------------------------------------------------

src/plasma/thirdparty/ae: Modified / 3-Clause BSD

Copyright (c) 2006-2010, Salvatore Sanfilippo <antirez at gmail dot com>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Redis nor the names of its contributors may be used
to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

--------------------------------------------------------------------------------

src/plasma/thirdparty/dlmalloc.c: CC0

This is a version (aka dlmalloc) of malloc/free/realloc written by
Doug Lea and released to the public domain, as explained at
http://creativecommons.org/publicdomain/zero/1.0/ Send questions,
comments, complaints, performance data, etc to dl@cs.oswego.edu

--------------------------------------------------------------------------------

src/plasma/common.cc (some portions)

Copyright (c) Austin Appleby (aappleby (AT) gmail)

Some portions of this file are derived from code in the MurmurHash project

All code is released to the public domain. For business purposes, Murmurhash is
under the MIT license.

https://sites.google.com/site/murmurhash/

--------------------------------------------------------------------------------

src/arrow/util (some portions): Apache 2.0, and 3-clause BSD

Some portions of this module are derived from code in the Chromium project,
Expand Down Expand Up @@ -12003,11 +11932,11 @@ OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR

--------------------------------------------------------------------------------
Dependency : github.com/dustin/go-humanize
Version: v1.0.0
Version: v1.0.1
Licence type (autodetected): MIT
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/dustin/go-humanize@v1.0.0/LICENSE:
Contents of probable licence file $GOMODCACHE/github.com/dustin/go-humanize@v1.0.1/LICENSE:

Copyright (c) 2005-2008 Dustin Sallings <dustin@spy.net>

Expand Down Expand Up @@ -43353,11 +43282,11 @@ SOFTWARE.

--------------------------------------------------------------------------------
Dependency : github.com/mattn/go-isatty
Version: v0.0.16
Version: v0.0.17
Licence type (autodetected): MIT
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/mattn/go-isatty@v0.0.16/LICENSE:
Contents of probable licence file $GOMODCACHE/github.com/mattn/go-isatty@v0.0.17/LICENSE:

Copyright (c) Yasuhiro MATSUMOTO <mattn.jp@gmail.com>

Expand Down
6 changes: 3 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ require (
github.com/dolmen-go/contextio v0.0.0-20200217195037-68fc5150bcd5
github.com/dop251/goja v0.0.0-20200831102558-9af81ddcf0e1
github.com/dop251/goja_nodejs v0.0.0-20171011081505-adff31b136e6
github.com/dustin/go-humanize v1.0.0
github.com/dustin/go-humanize v1.0.1
github.com/eapache/go-resiliency v1.2.0
github.com/eclipse/paho.mqtt.golang v1.3.5
github.com/elastic/elastic-agent-client/v7 v7.1.2
Expand Down Expand Up @@ -192,7 +192,7 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armresources v1.0.0
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v0.4.1
github.com/Azure/go-autorest/autorest/adal v0.9.14
github.com/apache/arrow/go/v11 v11.0.0
github.com/apache/arrow/go/v12 v12.0.1-0.20230605094802-c153c6d36ccf
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.7
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.11.17
github.com/aws/aws-sdk-go-v2/service/cloudformation v1.20.4
Expand Down Expand Up @@ -311,7 +311,7 @@ require (
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
github.com/mailru/easyjson v0.7.6 // indirect
github.com/markbates/pkger v0.17.1 // indirect
github.com/mattn/go-isatty v0.0.16 // indirect
github.com/mattn/go-isatty v0.0.17 // indirect
github.com/mattn/go-runewidth v0.0.9 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 // indirect
github.com/minio/asm2plan9s v0.0.0-20200509001527-cdd76441f9d8 // indirect
Expand Down
11 changes: 6 additions & 5 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,8 @@ github.com/antlr/antlr4/runtime/Go/antlr/v4 v4.0.0-20230305170008-8188dc5388df/g
github.com/aokoli/goutils v1.0.1/go.mod h1:SijmP0QR8LtwsmDs8Yii5Z/S4trXFGFC2oO5g9DP+DQ=
github.com/apache/arrow/go/arrow v0.0.0-20191024131854-af6fa24be0db/go.mod h1:VTxUBvSJ3s3eHAg65PNgrsn5BtqCRPdmyXh6rAfdxN0=
github.com/apache/arrow/go/arrow v0.0.0-20200923215132-ac86123a3f01/go.mod h1:QNYViu/X0HXDHw7m3KXzWSVXIbfUvJqBFe6Gj8/pYA0=
github.com/apache/arrow/go/v11 v11.0.0 h1:hqauxvFQxww+0mEU/2XHG6LT7eZternCZq+A5Yly2uM=
github.com/apache/arrow/go/v11 v11.0.0/go.mod h1:Eg5OsL5H+e299f7u5ssuXsuHQVEGC4xei5aX110hRiI=
github.com/apache/arrow/go/v12 v12.0.1-0.20230605094802-c153c6d36ccf h1:s5MDQXJmEalr0Urt0rPlX5UAE2BcHTiex/2Lt2O9p84=
github.com/apache/arrow/go/v12 v12.0.1-0.20230605094802-c153c6d36ccf/go.mod h1:weuTY7JvTG/HDPtMQxEUp7pU73vkLWMLpY67QwZ/WWw=
github.com/apache/thrift v0.12.0/go.mod h1:cp2SuWMxlEZw2r+iP2GNCdIi4C1qmUzdZFSVb+bacwQ=
github.com/apache/thrift v0.13.0/go.mod h1:cp2SuWMxlEZw2r+iP2GNCdIi4C1qmUzdZFSVb+bacwQ=
github.com/apache/thrift v0.16.0/go.mod h1:PHK3hniurgQaNMZYaCLEqXKsYK8upmhPbmdP2FXSqgU=
Expand Down Expand Up @@ -496,8 +496,9 @@ github.com/dolmen-go/contextio v0.0.0-20200217195037-68fc5150bcd5/go.mod h1:cxc2
github.com/dop251/goja_nodejs v0.0.0-20171011081505-adff31b136e6 h1:RrkoB0pT3gnjXhL/t10BSP1mcr/0Ldea2uMyuBr2SWk=
github.com/dop251/goja_nodejs v0.0.0-20171011081505-adff31b136e6/go.mod h1:hn7BA7c8pLvoGndExHudxTDKZ84Pyvv+90pbBjbTz0Y=
github.com/dustin/go-humanize v0.0.0-20171111073723-bb3d318650d4/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk=
github.com/dustin/go-humanize v1.0.0 h1:VSnTsYCnlFHaM2/igO1h6X3HA71jcobQuxemgkq4zYo=
github.com/dustin/go-humanize v1.0.0/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/eapache/go-resiliency v1.1.0/go.mod h1:kFI+JgMyC7bLPUVY133qvEBtVayf5mFgVsvEsIPBvNs=
github.com/eapache/go-resiliency v1.2.0 h1:v7g92e/KSN71Rq7vSThKaWIq68fL4YHvWyiUKorFR1Q=
github.com/eapache/go-resiliency v1.2.0/go.mod h1:kFI+JgMyC7bLPUVY133qvEBtVayf5mFgVsvEsIPBvNs=
Expand Down Expand Up @@ -1211,8 +1212,8 @@ github.com/mattn/go-isatty v0.0.10/go.mod h1:qgIWMr58cqv1PHHyhnkY9lrL7etaEgOFcME
github.com/mattn/go-isatty v0.0.11/go.mod h1:PhnuNfih5lzO57/f3n+odYbM4JtupLOxQOAqxQCu2WE=
github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU=
github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27kJ6hsGG94=
github.com/mattn/go-isatty v0.0.16 h1:bq3VjFmv/sOjHtdEhmkEV4x1AJtvUvOJ2PFAZ5+peKQ=
github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
github.com/mattn/go-isatty v0.0.17 h1:BTarxUcIeDqL27Mc+vyvdWYSL28zpIhv3RoTdsLMPng=
github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
github.com/mattn/go-runewidth v0.0.2/go.mod h1:LwmH8dsx7+W8Uxz3IHJYH5QSwggIsqBzpuz5H//U1FU=
github.com/mattn/go-runewidth v0.0.3/go.mod h1:LwmH8dsx7+W8Uxz3IHJYH5QSwggIsqBzpuz5H//U1FU=
github.com/mattn/go-runewidth v0.0.9 h1:Lm995f3rfxdpd6TSmuVCHVb/QhupuXlYr8sCI/QdE+0=
Expand Down
8 changes: 4 additions & 4 deletions x-pack/libbeat/reader/parquet/parquet.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ import (
"fmt"
"io"

"github.com/apache/arrow/go/v11/arrow/memory"
"github.com/apache/arrow/go/v11/parquet"
"github.com/apache/arrow/go/v11/parquet/file"
"github.com/apache/arrow/go/v11/parquet/pqarrow"
"github.com/apache/arrow/go/v12/arrow/memory"
"github.com/apache/arrow/go/v12/parquet"
"github.com/apache/arrow/go/v12/parquet/file"
"github.com/apache/arrow/go/v12/parquet/pqarrow"
)

// BufferedReader parses parquet inputs from io streams.
Expand Down
68 changes: 36 additions & 32 deletions x-pack/libbeat/reader/parquet/parquet_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
package parquet

import (
"bufio"
"bytes"
"encoding/json"
"fmt"
Expand All @@ -15,10 +14,10 @@ import (
"path/filepath"
"testing"

"github.com/apache/arrow/go/v11/arrow"
"github.com/apache/arrow/go/v11/arrow/array"
"github.com/apache/arrow/go/v11/arrow/memory"
"github.com/apache/arrow/go/v11/parquet/pqarrow"
"github.com/apache/arrow/go/v12/arrow"
"github.com/apache/arrow/go/v12/arrow/array"
"github.com/apache/arrow/go/v12/arrow/memory"
"github.com/apache/arrow/go/v12/parquet/pqarrow"
"github.com/stretchr/testify/assert"
)

Expand Down Expand Up @@ -171,20 +170,22 @@ func createRandomParquet(t testing.TB, fname string, numCols int, numRows int) m

func TestParquetWithFiles(t *testing.T) {
testCases := []struct {
parquetFile string
jsonFile string
parquetFile string
jsonFile string
maxRowsToCompare int
}{
{
parquetFile: "vpc_flow.gz.parquet",
jsonFile: "vpc_flow.ndjson",
},
{
parquetFile: "cloudtrail.parquet",
jsonFile: "cloudtrail.ndjson",
jsonFile: "cloudtrail.json",
},
{
parquetFile: "route53.parquet",
jsonFile: "route53.ndjson",
jsonFile: "route53.json",
},
{
parquetFile: "vpc_flow.gz.parquet",
jsonFile: "vpc_flow.json",
maxRowsToCompare: 4,
},
}

Expand All @@ -198,43 +199,38 @@ func TestParquetWithFiles(t *testing.T) {
}
defer parquetFile.Close()

jsonFile, err := os.Open(filepath.Join(testDataPath, tc.jsonFile))
if err != nil {
t.Fatalf("Failed to open json test file: %v", err)
}
defer jsonFile.Close()

orderedJSON, rows := readJSONFromFile(t, jsonFile)
orderedJSON, rows := readJSONFromFile(t, filepath.Join(testDataPath, tc.jsonFile))
cfg := &Config{
// we set ProcessParallel to true as this always has the best performance
ProcessParallel: true,
// batch size is set to 1 because we need to compare individual records one by one
BatchSize: 1,
}
readAndCompareParquetFile(t, cfg, parquetFile, orderedJSON, rows)
readAndCompareParquetFile(t, cfg, parquetFile, orderedJSON, rows, tc.maxRowsToCompare)
})
}
}

// readJSONFromFile reads the json file and returns the data as an ordered map (row number -> json string)
// along with the number of rows in the file
func readJSONFromFile(t *testing.T, file *os.File) (map[int]string, int) {
func readJSONFromFile(t *testing.T, filepath string) (map[int]string, int) {
fileBytes, err := os.ReadFile(filepath)
assert.NoError(t, err)
var rawMessages []json.RawMessage
err = json.Unmarshal(fileBytes, &rawMessages)
assert.NoError(t, err)
data := make(map[int]string)
scanner := bufio.NewScanner(file)
row := 0
for scanner.Scan() {
data[row] = scanner.Text()
var row int
for _, rawMsg := range rawMessages {
data[row] = string(rawMsg)
row++
}
efd6 marked this conversation as resolved.
Show resolved Hide resolved
if err := scanner.Err(); err != nil {
t.Fatalf("failed to read ndjson file: %v", err)
}

return data, row
}

// readAndCompareParquetFile reads the parquet file and compares the data with the input data
func readAndCompareParquetFile(t *testing.T, cfg *Config, file *os.File, data map[int]string, rows int) {
func readAndCompareParquetFile(t *testing.T, cfg *Config, file *os.File, data map[int]string, rows int, maxRowsToCompare int) {
sReader, err := NewBufferedReader(file, cfg)
if err != nil {
t.Fatalf("failed to init stream reader: %v", err)
Expand All @@ -248,9 +244,17 @@ func readAndCompareParquetFile(t *testing.T, cfg *Config, file *os.File, data ma
if val != nil {
rowCount = readAndCompareParquetJSON(t, bytes.NewReader(val), data, rowCount)
}
if maxRowsToCompare > 0 && rowCount == maxRowsToCompare {
break
}
}
// if maxRowsToCompare == 0 then we compare the row count
if maxRowsToCompare == 0 {
// asserts of number of rows read is the same as the number of rows from the input file
assert.Equal(t, rows, rowCount)
} else {
assert.EqualValues(t, rowCount, maxRowsToCompare)
}
// asserts of number of rows read is the same as the number of rows from the input file
assert.Equal(t, rows, rowCount)
// closes the stream reader and asserts that there are no errors
err = sReader.Close()
assert.NoError(t, err)
Expand Down
Loading