Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress large test event logs #2648

Closed
wants to merge 1,806 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1806 commits
Select commit Hold shift + click to select a range
5165273
Branch 0.5 doc update (#2175)
sameerz Apr 20, 2021
d3775a1
Branch 0.5 doc update (#2175)
sameerz Apr 20, 2021
9150a05
Merge pull request #2194 from pxLi/m-0.5-to-0.6
pxLi Apr 20, 2021
cbcebd6
fix cudf 0.19.0 download link (#2195)
pxLi Apr 20, 2021
dbc67cd
fix merge conflict for 0.5 doc
pxLi Apr 20, 2021
72e7814
Merge pull request #2197 from pxLi/fix-merge-conflict-from-0.5
pxLi Apr 20, 2021
a50f9dd
Update PandasUDF doc (#2089)
wjxiz1992 Apr 20, 2021
9831c1e
fix merge conflict for udf doc from 0.5
pxLi Apr 20, 2021
e6fe915
Merge pull request #2199 from pxLi/fix-merge-conflict-2198
pxLi Apr 20, 2021
08a5bc1
Init scripts to install cuda11 runtime [skip ci] (#2185)
NvTimLiu Apr 20, 2021
512ba88
Remove easy unused symbols (#2172)
gerashegalov Apr 20, 2021
931e6c5
Merge pull request #2201 from NVIDIA/branch-0.5
nvauto Apr 20, 2021
cc3be0f
Updating documentation for data format support (#2086)
sameerz Apr 20, 2021
31dc4f6
Merge pull request #2202 from NVIDIA/branch-0.5
nvauto Apr 20, 2021
0a73d48
JNI fixes for StringWordCount native UDF example (#2190)
jlowe Apr 20, 2021
a974994
Merge pull request #2205 from NVIDIA/branch-0.5
nvauto Apr 20, 2021
c2eaa9e
Use CPM to fetch libcudf dependency for native UDF example build (#2191)
jlowe Apr 20, 2021
d0964b5
Avoid redundant collection conversions (#2210)
gerashegalov Apr 21, 2021
575f645
Merge pull request #2211 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
7e66f53
Fix index-based access to the head elements (#2192)
gerashegalov Apr 21, 2021
1193806
Merge pull request #2212 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
a945b07
Fix shim301db build (#2206)
gerashegalov Apr 21, 2021
19c43d8
Merge pull request #2213 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
e003b02
Revert "add nightly cache tests (#2083)" (#2208)
razajafri Apr 21, 2021
e79d730
Merge pull request #2214 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
d01f283
fix batch size default values in the tuning guide (#2207)
rongou Apr 21, 2021
f00b94e
Merge pull request #2216 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
eb3c302
Accelerate data transfer for `FlatMapGroupsInPandas` (#2178)
firestarman Apr 21, 2021
0555e4b
Remove restrictions in struct sort preventing calls to libcudf bounds…
gerashegalov Apr 21, 2021
20e2c05
Fix incorrect RegExpReplace children handling on Spark 3.1+ (#2218)
jlowe Apr 21, 2021
6fc711a
Merge pull request #2221 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
f60d11d
Fixed a few issues with out of core sort (#2209)
revans2 Apr 21, 2021
3de0800
Merge pull request #2224 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
57ae7ea
Remove groupId already specified in parent pom (#2222)
gerashegalov Apr 21, 2021
df8ae9f
Merge pull request #2228 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
b61c405
Adding subquery aggregate tests from SPARK-31620 (#2223)
abellina Apr 21, 2021
b2700dc
Merge pull request #2229 from NVIDIA/branch-0.5
nvauto Apr 21, 2021
e3f30c6
Support explode outer (#2215)
sperlingxx Apr 22, 2021
fa7c791
ParquetCachedBatchSerializer broadcast AllConfs instead of SQLConf to…
razajafri Apr 22, 2021
0cdfb62
Merge pull request #2233 from NVIDIA/branch-0.5
nvauto Apr 22, 2021
0989eae
Support GPU broadcast exchange reuse to feed CPU BHJ when AQE is enab…
andygrove Apr 23, 2021
6274fad
Initial implementation of row count estimates in cost-based optimizer…
andygrove Apr 23, 2021
5e1908d
Merge pull request #2241 from NVIDIA/branch-0.5
nvauto Apr 23, 2021
9d8d6fb
Add cosine similarity native UDF example (#2226)
jlowe Apr 23, 2021
4c96c1b
Fix pivot bug for decimalType (#2245)
nartal1 Apr 23, 2021
db4ab0d
Merge pull request #2248 from NVIDIA/branch-0.5
nvauto Apr 23, 2021
e226562
Run nightly tests for ParquetCachedBatchSerializer (#2204)
razajafri Apr 23, 2021
77041a6
Merge pull request #2250 from NVIDIA/branch-0.5
nvauto Apr 23, 2021
b455d1b
Fix issue when out of core sorting nested data types (#2251)
revans2 Apr 23, 2021
794764e
Merge pull request #2253 from NVIDIA/branch-0.5
nvauto Apr 23, 2021
909ce87
Update UDF native example libcudf dependency to 0.20 (#2249)
jlowe Apr 24, 2021
0161f96
Add shuffle doc section on the periodicGC configuration (#2242)
abellina Apr 24, 2021
c8b7691
Merge pull request #2255 from NVIDIA/branch-0.5
nvauto Apr 24, 2021
ec607c9
Get the correct 'PIPESTATUS' in bash (#2240)
NvTimLiu Apr 24, 2021
f6ac3d3
Merge pull request #2256 from NVIDIA/branch-0.5
nvauto Apr 24, 2021
9f73867
update cudf version to 0.19.1 (#2237)
pxLi Apr 25, 2021
c5f3352
fix merge conflict from 0.5 to 0.6 #2258
pxLi Apr 25, 2021
29cda70
Merge pull request #2259 from pxLi/fix-merge-conflict-2258
pxLi Apr 25, 2021
7264ce7
Make CBO row count test more robust (#2261)
andygrove Apr 26, 2021
fda2437
Merge pull request #2262 from NVIDIA/branch-0.5
nvauto Apr 26, 2021
4f2d6e8
Fix distributed cache to read requested schema (#2235)
razajafri Apr 26, 2021
60be314
Merge pull request #2265 from NVIDIA/branch-0.5
nvauto Apr 26, 2021
13e199f
Allow specifying a superclass for non-GPU execs (#2247)
gerashegalov Apr 26, 2021
d7a6a21
Merge pull request #2266 from NVIDIA/branch-0.5
nvauto Apr 26, 2021
5de2f24
updated gcp docs with custom dataproc image instructions (#2254)
aroraakshit Apr 26, 2021
8b0fb7c
Merge pull request #2267 from NVIDIA/branch-0.5
nvauto Apr 26, 2021
37ef220
Add spark312 and spark320 versions of cache serializer (#2264)
jlowe Apr 26, 2021
740347f
Merge pull request #2268 from NVIDIA/branch-0.5
nvauto Apr 26, 2021
4a55c61
abstract multi-file clouder reader and apply to parquet (#2101)
wbo4958 Apr 27, 2021
bf28e5a
Collect GPU_OP_TIME and FETCH_TIME for GpuColumnarToRowExec (#2269)
andygrove Apr 28, 2021
73daca1
Remove download section for unreleased 0.4.2 (#2281)
jlowe Apr 28, 2021
2aed5f4
Merge pull request #2284 from NVIDIA/branch-0.5
nvauto Apr 28, 2021
29495f5
Require single batch for full outer join streaming (#2285)
revans2 Apr 28, 2021
9ebc516
Merge pull request #2290 from NVIDIA/branch-0.5
nvauto Apr 28, 2021
2f27a2c
update blossom-ci action repo addr (#2291)
pxLi Apr 29, 2021
12c255c
Update docs to warn against 450.80.02 driver with 10.x toolkit (#2289)
sameerz Apr 29, 2021
4293c26
Merge pull request #2294 from NVIDIA/branch-0.5
nvauto Apr 29, 2021
774200e
update cudf version to 0.19.2 (#2293)
pxLi Apr 29, 2021
4c13d5b
fix merge conflict from 0.5 to 0.6
pxLi Apr 29, 2021
f00b27d
Merge pull request #2297 from pxLi/fix-merge-conflict-2296
pxLi Apr 29, 2021
3f51e5b
Make optimizer pluggable (#2287)
andygrove Apr 29, 2021
fc63cc9
Add cosine similarity native UDF example to RAPIDS UDF docs (#2288)
jlowe Apr 29, 2021
7ae7103
Update changelog for v0.5.0 release [skip ci] (#2298)
NvTimLiu Apr 29, 2021
6685906
Merge pull request #2300 from NVIDIA/branch-0.5
nvauto Apr 29, 2021
aced77f
Update the documentation for behavior of reading early dates in LEGAC…
viadea Apr 29, 2021
aa20535
Update doc to reflect nanosleep problem with 460.32.03 (#2301)
sameerz Apr 29, 2021
5b689b9
Merge pull request #2303 from NVIDIA/branch-0.5
nvauto Apr 29, 2021
6f26c58
Do not cache the batches for release. (#2239)
firestarman Apr 29, 2021
a5d0c51
Update CHANGELOG.md (#2304)
sameerz Apr 29, 2021
96de5f2
Merge pull request #2305 from NVIDIA/branch-0.5
nvauto Apr 29, 2021
f250cd9
UCX: add endpoint by sockaddr connection + peer error handling. (#2131)
petro-rudenko Apr 29, 2021
1274224
Fixed indentation (#2308)
razajafri Apr 30, 2021
f8c30a7
Support for date_format (#2282)
nartal1 Apr 30, 2021
518a5b6
Fix ColumnarToRowIterator handling of empty batches (#2318)
jlowe May 1, 2021
67ee107
Disable hash partitioning on arrays (#2319)
jlowe May 1, 2021
f733ec0
Merge branch 'branch-0.5' into fix-merge
jlowe May 1, 2021
66eded7
Update doc to note that single quoted json strings are not ok (#2316)
sameerz May 1, 2021
7e49e2f
Merge pull request #2324 from jlowe/fix-merge
revans2 May 3, 2021
244dcb0
Support GpuHashJoin on Structs (#2173)
sperlingxx May 3, 2021
707fd8d
Restore output of unsupported column types (#2321)
gerashegalov May 3, 2021
5571652
Update changelog for 0.5.0 release (#2326)
sameerz May 3, 2021
545cf31
Merge pull request #2333 from NVIDIA/branch-0.5
nvauto May 3, 2021
8acac67
Allow batching the output of a join (#2310)
revans2 May 3, 2021
2967531
Remove ShuffleExchangeExec from the automatic allow list in tests (#2…
revans2 May 4, 2021
0300ad7
Update the databricks shim for struct joins. (#2337)
revans2 May 4, 2021
1486407
Commonize tagging for hash and sort aggregates (#2338)
jlowe May 5, 2021
dc9988c
Script for auto prioritizing audit (#2166)
razajafri May 5, 2021
c693135
Fix deprecated aggregation and GpuFlatMapGroupsInPandasExec scaladoc …
jlowe May 5, 2021
6f2d92b
Allow partial materialization of broadcast nested loop join and carte…
revans2 May 5, 2021
fdf585e
Remove support for Spark 3.0.0 (#2339)
jlowe May 5, 2021
717a76c
Parquet support for Structs (#2271)
razajafri May 6, 2021
eb639fd
Fix NPE when spilling with debug logging enabled (#2351)
jlowe May 6, 2021
bddfe2f
Restore hash partitioning of arrays (#2347)
jlowe May 6, 2021
b2284e8
Fix issue with fixed width size calculation for join (#2356)
revans2 May 6, 2021
b3690e9
Have braodcast exchange exec produce a contiguous table (#2357)
revans2 May 6, 2021
02d6939
Allow noop filter and joins to keep contig tables untouched. (#2358)
revans2 May 7, 2021
503c104
ParquetCachedBatchSerializer (PCBS) shouldn't write int96 (#2362)
razajafri May 7, 2021
f4afc0c
Support columnar pipeline for Grouped Agg UDF (#2263)
firestarman May 7, 2021
fc1216e
Fix Alias calling proper shim version (#2361)
tgravescs May 7, 2021
52ff5d7
Deal with rolling API Changes in CUDF (#2270)
revans2 May 7, 2021
9276d96
Fix noop filter check to handle nulls (#2366)
revans2 May 7, 2021
9979c38
Print full column type when type conversion check fails (#2369)
jlowe May 7, 2021
710296c
free rapids buffers safely if an exception is thrown (#2353)
rongou May 7, 2021
afd8979
Append my id to blossom-ci whitelist (#2373)
zhanga5 May 8, 2021
7c7832a
Disable the cudf window test on DB (#2376)
firestarman May 10, 2021
83516d3
Remove team member not working on the project (#2377)
sameerz May 10, 2021
5d629ea
batch small buffers when spilling via GDS (#2295)
rongou May 10, 2021
02c62dd
Update CBO to use fixed cost model and to respect estimated row count…
andygrove May 10, 2021
a2b8da1
Remove unneeded join key "optimization" (#2383)
revans2 May 10, 2021
44a3a27
added more test type (#2386)
razajafri May 11, 2021
08ed50c
Add support for Databricks 8.2 runtime (#2381)
tgravescs May 11, 2021
81d7314
add doc for GDS spilling (#2387)
rongou May 11, 2021
bb404ab
Show expected cudf type when type conversion check fails (#2380)
jlowe May 11, 2021
31daed3
Documentation for Decimal precision and scale handling (#2360)
viadea May 11, 2021
16045f3
Closed unclosed views to avoid memory leak (#2392)
razajafri May 11, 2021
d2f693a
Add elements in structs to round robin test as it can fail due to: SP…
abellina May 12, 2021
69dc8d3
change location of audit log (#2393)
razajafri May 12, 2021
bfcd156
`CaseWhen` supports `ArrayType` (#2388)
firestarman May 12, 2021
3fa0b29
Document tuning of spark.task.resource.gpu.amount (#2394)
jlowe May 12, 2021
9fe6d8c
Update Spark Operator getting started guide. (#2311)
viadea May 12, 2021
8fa50c3
Keep GpuCoalesceExce if a shuffle or custom reader is the parent exec…
abellina May 12, 2021
9cbc399
Refactor to avoid code duplication between GpuBroadcastToCpu and GpuB…
andygrove May 12, 2021
c445f52
Fix databricks 301 extra include and update ci/cd scripts for databri…
tgravescs May 12, 2021
376b962
Enable struct columns for GpuHashAggregateExec (#2274)
gerashegalov May 12, 2021
c9d42eb
Final activate databricks 8.2 runtime include and use property instea…
tgravescs May 13, 2021
797bed8
Simplify launching integration tests a pseudo-distributed mode (#2410)
gerashegalov May 14, 2021
d473970
Ignore group order in test_struct_groupby_count (#2413)
gerashegalov May 14, 2021
3f1b958
Provide optional pre-commit hook config (#2403)
gerashegalov May 14, 2021
bee6406
Change databricks zone (#2418)
NvTimLiu May 14, 2021
8aa3150
Support integral type range window (#2020)
wbo4958 May 14, 2021
b0e901b
Unify legacy and 3.1.x struct cast implementations (#2395)
gerashegalov May 14, 2021
ec021d1
Refactor the handling of scalar columnarEval results (#2363)
firestarman May 15, 2021
247b758
Support ElementAt (#2260)
wjxiz1992 May 15, 2021
f20d465
Fix rolling window API differences (#2419)
revans2 May 17, 2021
640cd7d
Update cudfjni to 21.06-SNAPSHOT (#2434)
pxLi May 18, 2021
d65476d
AnsiMode support for GetArrayItem GetMapValue and ElementAt for Spark…
wjxiz1992 May 18, 2021
a27cd29
Switch to single-level structs in hash aggregate tests and work aroun…
gerashegalov May 18, 2021
e36d222
Split up stream side for exploding joins (#2433)
revans2 May 19, 2021
3cb9606
Supports `GpuLiteral` of array type (#2313)
firestarman May 19, 2021
4bbe192
Add profiling tool (#2402)
nartal1 May 19, 2021
51c0dc2
support creating list ColumnVector for Literal(ArrayType(NullType)) (…
wbo4958 May 19, 2021
c7ebbaa
support lead/lag on arrays (#2435)
wbo4958 May 19, 2021
97e4508
Update tuning docs to add batch size recommendations. (#2451)
revans2 May 19, 2021
009b903
support creating array of array (#2299)
wbo4958 May 19, 2021
6874645
Implement cast of nested arrays (#2426)
gerashegalov May 20, 2021
2b781d8
Filter out the nulls after slicing the batches. (#2447)
firestarman May 20, 2021
2ec4fe1
Fall back to the CPU for literal array values on case/when (#2456)
revans2 May 20, 2021
c694fdc
support GpuConcat on ArrayType (#2379)
sperlingxx May 20, 2021
7b4d5af
fix GpuCreateNamedStruct not serializable issue (#2442)
wbo4958 May 20, 2021
e1b493c
Include memory access costs in cost models (cost-based optimizer) (#2…
andygrove May 20, 2021
b2acb0d
Change shuffle metadata messages to use UCX Active Messages (#2409)
abellina May 21, 2021
13d618d
Update plugin version to 21.06.0 (#2446)
pxLi May 21, 2021
ee4b369
skip test_window_aggs_for_rows_lead_lag_on_arrays (#2471)
wbo4958 May 21, 2021
49b0086
Fix for UCP Listener created spark.port.maxRetries times (#2476)
abellina May 21, 2021
9d6cb6e
fix uncertain plan capture in hash_aggregate_test.py (#2472)
sperlingxx May 24, 2021
31b168e
Fixing the failing test `test_window` on DB (#2484)
firestarman May 24, 2021
a813edf
Report gpuOpTime instead of totalTime for project, filter, limit, and…
andygrove May 24, 2021
22a0ad6
Adding additional functionalities to profiling tool (#2469)
nartal1 May 24, 2021
d631089
Remove the null replacement in `computePredicate` (#2486)
firestarman May 24, 2021
1cf741b
Add temporary logging for Dataproc round robin fallback issue (#2490)
jlowe May 24, 2021
3f64354
Window tests with smaller batches (#2482)
revans2 May 24, 2021
b304e67
Fix regression in cost-based optimizer when calculating cost for Wind…
andygrove May 25, 2021
5986fc3
improve window agg test for range numeric types (#2493)
wbo4958 May 25, 2021
e603a1a
Add comments for lazy binding in WindowInPandas (#2496)
firestarman May 25, 2021
8ede4ce
Add in basic support for scalar maps and allow nesting in named_struc…
revans2 May 25, 2021
d7dec67
Fix regression in TOTAL_TIME metrics for Databricks (#2499)
andygrove May 25, 2021
a5deba8
Remove temporary logging and adjust test column names (#2503)
jlowe May 25, 2021
844aa2a
Remove work around for nulls in semi-anti joins (#2502)
revans2 May 25, 2021
203769f
Use the APIs for creation from utf8 string. (#2506)
firestarman May 26, 2021
bb332c9
Update Dockerfile for native UDF (#2500)
NvTimLiu May 26, 2021
3b718f8
Update shuffle documentation for branch-21.06 and UCX 1.10.1 (#2475)
abellina May 26, 2021
c304962
Add code for generating dot file visualizations (#2449)
andygrove May 26, 2021
cefeb3f
Added in basic support for scalar structs to named_struct (#2509)
revans2 May 26, 2021
fda73c3
Qualification tool updates for datasets, udf, and misc fixes (#2505)
tgravescs May 26, 2021
6c4b947
Avoid listener race collecting wrong plan in assert_gpu_fallback_coll…
jlowe May 27, 2021
a5dd78c
Add EMR 6.3 documentation (#2463)
viadea May 27, 2021
3da9ba9
Improve debug print to include addresses and null counts (#2526)
revans2 May 27, 2021
1cd417e
Improve test coverage for sorting structs (#2507)
gerashegalov May 27, 2021
8bc4f79
Support concat with separator on GPU (#2479)
tgravescs May 27, 2021
0f4f532
Change Databricks 310 shim to be 311 to match reported spark.version …
tgravescs May 27, 2021
3a83a21
Make GenerateDot test more robust (#2528)
andygrove May 27, 2021
d90c836
Fix concat_ws test for databricks (#2533)
tgravescs May 28, 2021
02a0c36
Report opTime not totalTime for expand, range, and generate execs (#2…
andygrove May 28, 2021
27250f7
Add nested types and decimals to CoalesceExec (#2531)
gerashegalov May 28, 2021
8c94951
Add CentOS documentation and improve dockerfiles for UCX (#2537)
abellina May 28, 2021
bc2c839
Remove scaladoc on an internal method to avoid warning during build (…
razajafri May 28, 2021
59b126c
Add Struct support for ParquetWriter (#2514)
razajafri May 28, 2021
8bad0c7
disable cudf_udf tests for #2521 (#2539)
pxLi May 31, 2021
d3321f8
Update spark 312 shim, and Add spark 313-SNAPSHOT shim (#2540)
pxLi May 31, 2021
8ae4a2a
enable auto-merge from 21.06 to 21.08 (#2542)
pxLi May 31, 2021
abc8a5c
Refactor the code for conditional expressions (#2508)
firestarman Jun 1, 2021
a44211c
Refactor the code for conditional expressions (#2508)
firestarman Jun 1, 2021
061095f
Don't do an extra shuffle in some TopN cases (#2536)
revans2 Jun 1, 2021
c1ac13f
support interval.microseconds for range window TimeStampType (#2525)
wbo4958 Jun 1, 2021
f05b7bb
Add event logs for integration tests (#2520)
gerashegalov Jun 1, 2021
72cdec4
Add cloudera shim layer (#2423)
sririshindra Jun 2, 2021
b76fae5
Release profiling tool jar to maven central (#2559)
NvTimLiu Jun 2, 2021
769a788
Fixed code indentation in ParquetCachedBatchSerializer (#2538)
razajafri Jun 2, 2021
ea189f2
Remove FETCH_TIME, use COLLECT_TIME instead (#2515)
andygrove Jun 3, 2021
f70dbcb
align GDS reads/writes to 4 KiB (#2460)
rongou Jun 3, 2021
ea7dcdd
expose unspill config option (#2566)
rongou Jun 3, 2021
a7ebeb6
Cancel requests that are queued for a client/handler on error (#2553)
abellina Jun 3, 2021
1084682
Change RMM_ALLOC_FRACTION to represent percentage of available memory…
andygrove Jun 3, 2021
369127f
Remove -SNAPSHOT in documentation in preparation for release (#2569)
sameerz Jun 4, 2021
22f5421
Change test_single_sort_in_part to print source data frame on failure…
abellina Jun 4, 2021
1129641
Add Qualification tool support (#2574)
tgravescs Jun 4, 2021
daedfed
Fix package name (#2588)
andygrove Jun 4, 2021
72fbf97
Handle UCX connection timeouts from heartbeats more gracefully (#2587)
abellina Jun 4, 2021
e65e826
Profiling tool support for collection and analysis (#2590)
tgravescs Jun 4, 2021
35f910d
Implement test for qualification tool sql metric aggregates (#2591)
andygrove Jun 5, 2021
a61f67f
Change the README of the qualification and profiling tool to match th…
viadea Jun 5, 2021
bf3492c
Add the doc for -g option of the profiling tool. (#2603)
viadea Jun 5, 2021
4621fc1
Add filter support for qualification and profiling tool. (#2576)
nartal1 Jun 6, 2021
3e303c0
Rename rapids-4-spark-tools directory to tools (#2598)
nartal1 Jun 6, 2021
ca8104c
Profile/qualification tool error handling improvements and support sp…
tgravescs Jun 6, 2021
71db409
Revert "disable cudf_udf tests for #2521" (#2611)
pxLi Jun 7, 2021
800d8ea
rapids-4-spark-tools directory renamed to tools (#2614)
NvTimLiu Jun 7, 2021
99dec80
Profiling tool, add in job to stage, duration, executor cpu time, fix…
tgravescs Jun 7, 2021
a7a3d63
Correct an issue for README for tools and also correct s3 solution in…
viadea Jun 7, 2021
b94d5a2
Change aggregation of executor CPU and run time for Qualification too…
tgravescs Jun 8, 2021
ae33cc4
Ignore order for map udf test (#2627)
firestarman Jun 8, 2021
1229894
Profiling tool: Add support for health check. (#2632)
nartal1 Jun 8, 2021
a0c93a3
Enable processing of compressed Spark event logs (#2626)
gerashegalov Jun 8, 2021
ff2cf6c
Exclude failed jobs/queries from Qualification tool output (#2625)
tgravescs Jun 8, 2021
6ed5f5a
Profiling tool - Fix file writer for generating dot graphs, supportin…
tgravescs Jun 8, 2021
0a544ed
Update download.md and FAQ.md for 21.06.0 (#2577)
sameerz Jun 8, 2021
73be7db
Fix databricks for 3.1.1 (#2637)
revans2 Jun 8, 2021
bb6d2ab
Add physical plan to the dot file as the graph label (#2640)
gerashegalov Jun 9, 2021
8dc6bde
Profiling tool: Health check follow on (#2643)
nartal1 Jun 9, 2021
af82432
Compress large test event logs
nartal1 Jun 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 14 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


/jenkins/ @jlowe @revans2 @tgravescs @GaryShen2008 @NvTimLiu
pom.xml @jlowe @revans2 @tgravescs @GaryShen2008 @NvTimLiu
25 changes: 25 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
name: Bug report
about: Create a bug report to help us improve RAPIDS Accelerator for Apache Spark
title: "[BUG]"
labels: "? - Needs Triage, bug"
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**Steps/Code to reproduce bug**
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment details (please complete the following information)**
- Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
- Spark configuration settings related to the issue

**Additional context**
Add any other context about the problem here.
35 changes: 35 additions & 0 deletions .github/ISSUE_TEMPLATE/documentation-request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
name: Documentation request
about: Report incorrect or needed documentation
title: "[DOC]"
labels: "? - Needs Triage, documentation"
assignees: ''

---

## Report incorrect documentation

**Location of incorrect documentation**
Provide links and line numbers if applicable.

**Describe the problems or issues found in the documentation**
A clear and concise description of what you found to be incorrect.

**Steps taken to verify documentation is incorrect**
List any steps you have taken:

**Suggested fix for documentation**
Detail proposed changes to fix the documentation if you have any.

---

## Report needed documentation

**Report needed documentation**
A clear and concise description of what documentation you believe it is needed and why.

**Describe the documentation you'd like**
A clear and concise description of what you want to happen.

**Steps taken to search for needed documentation**
List any steps you have taken:
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for RAPIDS Accelerator for Apache Spark
title: "[FEA]"
labels: "? - Needs Triage, feature request"
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I wish the RAPIDS Accelerator for Apache Spark would [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context, code examples, or references to existing implementations about the feature request here.
10 changes: 10 additions & 0 deletions .github/ISSUE_TEMPLATE/submit-question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
name: Submit question
about: Ask a general question about RAPIDS Accelerator for Apache Spark
title: "[QST]"
labels: "? - Needs Triage, question"
assignees: ''

---

**What is your question?**
33 changes: 33 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!--

Thank you for contributing to RAPIDS Accelerator for Apache Spark!

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are being
made.

2. Please ensure that you have written units tests for the changes made/features
added.

3. If you are closing an issue please use one of the automatic closing words as
noted here: https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring additional
work on top of it, then remove the `[WIP]` label (if present).

6. Once all work has been done and review has taken place please do not add
features or make changes out of the scope of those requested by the reviewer
(doing this just add delays as already reviewed code ends up having to be
re-reviewed/it is hard to tell what is new etc!). Further, please avoid
rebasing your branch during the review process, as this causes the context
of any comments made by reviewers to be lost. If conflicts occur during
review then they should be resolved by merging into the branch used for
making the pull request.

Many thanks in advance for your cooperation!

-->
16 changes: 16 additions & 0 deletions .github/codecov.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copyright (c) 2021, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

codecov:
max_report_age: off
41 changes: 41 additions & 0 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# A workflow to keep BASE branch up-to-date from HEAD branch
name: auto-merge HEAD to BASE

on:
pull_request_target:
branches:
- branch-21.06
types: [closed]

jobs:
auto-merge:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
with:
ref: branch-21.06 # force to fetch from latest upstream instead of PR ref

- name: auto-merge job
uses: ./.github/workflows/auto-merge
env:
OWNER: NVIDIA
REPO_NAME: spark-rapids
HEAD: branch-21.06
BASE: branch-21.08
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR
22 changes: 22 additions & 0 deletions .github/workflows/auto-merge/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM python:alpine

WORKDIR /
COPY automerge .
RUN pip install requests && chmod +x /automerge

# require envs: OWNER,REPO_NAME,HEAD,BASE,GITHUB_TOKEN
ENTRYPOINT ["/automerge"]
20 changes: 20 additions & 0 deletions .github/workflows/auto-merge/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: 'auto-merge action'
description: 'auto-merge HEAD to BASE'
runs:
using: 'docker'
image: 'Dockerfile'

122 changes: 122 additions & 0 deletions .github/workflows/auto-merge/automerge
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/usr/bin/env python

# Copyright (c) 2020, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""A auto-merge tool

Create a PR to merge HEAD to BASE branch.
NOTE:
The generated PR should be automatically merged if no conflict. Otherwise, manual operation will be required.
"""

import os
import sys
import time

import requests

# ENV
OWNER = os.environ.get('OWNER')
assert OWNER, 'env OWNER should not be empty'
REPO_NAME = os.environ.get('REPO_NAME')
assert REPO_NAME, 'env REPO_NAME should not be empty'
HEAD = os.environ.get('HEAD')
assert HEAD, 'env HEAD should not be empty'
BASE = os.environ.get('BASE')
assert BASE, 'env BASE should not be empty'
AUTOMERGE_TOKEN = os.environ.get('AUTOMERGE_TOKEN')
assert AUTOMERGE_TOKEN, 'env AUTOMERGE_TOKEN should not be empty'
# static
API_URL = 'https://api.github.com'
AUTH_HEADERS = {
'Authorization': 'token ' + AUTOMERGE_TOKEN
}


def create():
url = f'{API_URL}/repos/{OWNER}/{REPO_NAME}/pulls'
params = {
'title': f'[auto-merge] {HEAD} to {BASE} [skip ci] [bot]',
'head': HEAD,
'base': BASE,
'body': f'auto-merge triggered by github actions on `{HEAD}` to create a PR keeping `{BASE}` up-to-date. If '
'this PR is unable to be merged due to conflicts, it will remain open until manually fix.',
'maintainer_can_modify': True
}
r = requests.post(url, headers=AUTH_HEADERS, json=params)
if r.status_code == 201:
print('SUCCESS - create PR')
pull = r.json()
number = str(pull['number'])
sha = str(pull['head']['sha'])
return number, sha, False
if r.status_code == 422: # early-terminate if no commits between HEAD and BASE
print('SUCCESS - No commits')
print(r.json())
return '', '', True
# FAILURE
print('FAILURE - create PR')
print(f'status code: {r.status_code}')
print(r.json())
sys.exit(1)


def auto_merge(number, sha):
url = f'{API_URL}/repos/{OWNER}/{REPO_NAME}/pulls/{number}/merge'
params = {
'sha': sha,
'merge_method': 'merge'
}
r = requests.put(url, headers=AUTH_HEADERS, json=params)
if r.status_code == 200:
comment(number, '**SUCCESS** - auto-merge')
print('SUCCESS - auto-merge')
sys.exit(0)
else:
print('FAILURE - auto-merge')
comment(number=number, content=f"""**FAILURE** - Unable to auto-merge. Manual operation is required.
```
{r.json()}
```
""")
print(f'status code: {r.status_code}')
print(r.json())
sys.exit(1)


def comment(number, content):
url = f'{API_URL}/repos/{OWNER}/{REPO_NAME}/issues/{number}/comments'
params = {
'body': content
}
r = requests.post(url, headers=AUTH_HEADERS, json=params)
if r.status_code == 201:
print('SUCCESS - create comment')
else:
print('FAILURE - create comment')
print(f'status code: {r.status_code}')
print(r.json())


def main():
number, sha, term = create()
if term:
sys.exit(0)

auto_merge(number, sha)


if __name__ == '__main__':
main()
Loading