-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification and Profiling tool handle Read formats and datatypes #2904
Merged
+974
−168
Merged
Changes from all commits
Commits
Show all changes
154 commits
Select commit
Hold shift + click to select a range
0cf96a4
Support rolled and compressed logs for CSPs and Apache Spark, do some
tgravescs 8462be3
add test files
tgravescs 5e287df
Add in db sim eventlogs
tgravescs 421f082
add missing files
tgravescs e616293
fix line length
tgravescs 1514052
print metadata
tgravescs 04c1e27
catch more exceptions
tgravescs 1482249
recurse
tgravescs 1a80bb9
return actual node
tgravescs 546bc5a
Add in another column to sort to keep output consistent
tgravescs c1a7173
Add in printing read schema
tgravescs 5ea4fe9
refactor
tgravescs f0cb1a7
fix null pointer
tgravescs f446950
add app index col
tgravescs 1d9c904
fix
tgravescs e45cb23
change to use lit
tgravescs df38c5a
look for datasource v2
tgravescs 45e1290
Update to print v2
tgravescs 0df55be
finish parsing schema v2
tgravescs 5c6b367
sort it
tgravescs 7a6d3f9
fix parsing schema v2
tgravescs b816378
handle ...
tgravescs 2fb7b12
change to store string for now
tgravescs 9aa5a52
remove struct< from string
tgravescs 070792e
remove debug
tgravescs 4970797
remove debug messages
tgravescs 5c41d2c
remove log
tgravescs fc71e1b
parse v2 file format
tgravescs 274fa77
fix including format:
tgravescs c106f9a
rename
tgravescs 4a93de0
add in test files
tgravescs ffcd0a0
update docs to monitoring page
tgravescs 119bc01
Merge remote-tracking branch 'origin/branch-21.08' into datatypes
tgravescs 466b9c5
cleanup and use spark session hadoop configuration
tgravescs 0280263
cleanup
tgravescs cb73db8
Add in generation of support ops for tools
tgravescs b3d9e95
Merge branch 'datatypes' of github.com:tgravescs/spark-rapids into da…
tgravescs a5fc1fb
fixes
tgravescs d8c299b
add in text support and write comma
tgravescs 9744e12
update output to csv
tgravescs 25c7464
update reading qualification
tgravescs 61e126a
Merge branch 'datatypes' of github.com:tgravescs/spark-rapids into da…
tgravescs 655d8dd
update way qualification with schema
tgravescs 123d37f
format
tgravescs 8aee779
do per format
tgravescs 58b9f62
fixes
tgravescs 87094cc
Merge remote-tracking branch 'origin/branch-21.08' into datatypes
tgravescs 05ea957
fix
tgravescs abc7ea2
Merge remote-tracking branch 'origin/branch-21.08' into datatypesnew
tgravescs d69c768
redo for redesign
tgravescs bd930fd
update to new design
tgravescs da4f470
fixes
tgravescs e61dbdd
check values
tgravescs 8569133
fix syntax
tgravescs 0d39c6e
fixes
tgravescs 732565b
change the way we read csv asnd headers
tgravescs 0496eac
lower case
tgravescs a8cf6ae
fixes
tgravescs 351b22b
Add in auto generation of supported ops for tools
tgravescs f7ec113
Add type conversions
tgravescs 51e7adc
add file format to csv
tgravescs e6ebdd9
fix close
tgravescs ab70406
print incomplete
tgravescs aa49544
Add separate class for type checker
tgravescs 723dd2d
fix parameters
tgravescs 31926cb
fix and debug
tgravescs 43be9a1
calculate as percent
tgravescs ea83770
Calculate score with the read format and datatypes included
tgravescs 2daacff
fixes
tgravescs 79584e5
calculate percent rounded
tgravescs c9a460e
document and cleanup
tgravescs 5792f84
use ratio
tgravescs 78fee51
calculate taks duration
tgravescs e79a99b
take into accoutn configs
tgravescs 6841b36
update recordings
tgravescs d7e6619
use committed off
tgravescs 24f03b6
fix output
tgravescs 9fb48ef
Merge branch 'datatypesnew' of github.com:tgravescs/spark-rapids into…
tgravescs b05866d
fix log file otuput
tgravescs ad88de2
round
tgravescs 048f253
remove some unneeded rounding
tgravescs 1dbf970
fix syntax
tgravescs 19e80f8
Merge remote-tracking branch 'origin/branch-21.08' into datatypesnew
tgravescs b412282
Change finding datatypes
tgravescs 757a154
fix types
tgravescs 07701be
fixes and debug
tgravescs 25aa8ca
add debug
tgravescs 80f1f85
fix equals vs contains
tgravescs 41d2536
Change scores
tgravescs 0e98967
add option for outputting file formats
tgravescs f49adaf
fix string interpol
tgravescs a5cd06d
update output files
tgravescs c9a0659
cleanup
tgravescs 5862ff9
Add test for profile datasource
tgravescs 11e5cde
fix tests
tgravescs 5fd1cc9
add compare info
tgravescs 2742bb5
move case class
tgravescs e93fadd
write out compare
tgravescs f967425
write out header
tgravescs 0816bc2
add test for dsv2
tgravescs 66f4c16
Merge remote-tracking branch 'origin/branch-21.08' into datatypesnew
tgravescs 6fe165c
Merge branch 'datatypesnew' of github.com:tgravescs/spark-rapids into…
tgravescs bae200a
add check for if decimal enabled
tgravescs bb3c3f7
Merge branch 'datatypesnew' of github.com:tgravescs/spark-rapids into…
tgravescs d8249e8
fixes
tgravescs 92cdc52
Merge branch 'datatypesnew' of github.com:tgravescs/spark-rapids into…
tgravescs eaadbc4
Update decimal to use configured off
tgravescs 2844e45
simplify weighted score for read data types
tgravescs df2f055
add plugin checker suite
tgravescs 579b947
more tests
tgravescs a8bdca3
force re-read
tgravescs 6ee3131
change source for testing
tgravescs 7ac42c2
more tests
tgravescs 1ed5573
fix tests
tgravescs eab9112
fix other test
tgravescs ee39cee
make plugin type checker optional
tgravescs 57e7d3a
Update qualification test results
tgravescs cf9b0eb
update test and fixes
tgravescs a060f83
add test files
tgravescs 2ec2461
more tests and cleanup
tgravescs a584df9
Merge branch 'datatypesnew' of github.com:tgravescs/spark-rapids into…
tgravescs 3a10f38
move rounding
tgravescs 1358ea8
Merge branch 'datatypesnew' of github.com:tgravescs/spark-rapids into…
tgravescs b1e3d95
write to stdout as well
tgravescs 9417ddc
output
tgravescs 6ba5efc
remove ln from println
tgravescs 7219173
shrink output report spacing
tgravescs ecfd4c7
commonize size
tgravescs bd09305
add df bakc in header
tgravescs 9955d2e
configure stdout off for tests
tgravescs bec2335
update readme
tgravescs 3d213ff
update desc
tgravescs 394a8a6
Fix missing extra info with type checks
tgravescs 1d6aace
don't include supplement text in qualification tool csv file
tgravescs c91ee3d
Add csv output for just not supported format and types
tgravescs f90f9ec
update tests
tgravescs b00f95b
handle empty string
tgravescs ea717ff
rename test files
tgravescs 4e1006a
Change the way we report ns
tgravescs 0f0b47c
fixes
tgravescs 746b404
add more tests
tgravescs 05f8a7e
update expected results
tgravescs fb6f05b
fix tests
tgravescs 36b0472
dedup types more
tgravescs 7c775d0
fix typo
tgravescs 2ecdc74
update test
tgravescs 86049d7
fix bug processing jobs without sql
tgravescs b1094da
fix bug
tgravescs 5aafb32
add in complex and decimal eventlog
tgravescs 3caac11
add test for complex and ecimal eventlog
tgravescs 495e324
add in expectataion file
tgravescs c8be975
update readme
tgravescs 378134e
fix typo
tgravescs 3b39b09
Merge remote-tracking branch 'origin/branch-21.08' into datatypesnew
tgravescs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Format,Direction,BOOLEAN,BYTE,SHORT,INT,LONG,FLOAT,DOUBLE,DATE,TIMESTAMP,STRING,DECIMAL,NULL,BINARY,CALENDAR,ARRAY,MAP,STRUCT,UDT | ||
CSV,read,CO,CO,CO,CO,CO,CO,CO,CO,CO,S,CO,NA,NS,NA,NA,NA,NA,NA | ||
ORC,read,S,S,S,S,S,S,S,S,S*,S,CO,NA,NS,NA,NS,NS,NS,NS | ||
Parquet,read,S,S,S,S,S,S,S,S,S*,S,CO,NA,NS,NA,PS*,PS*,PS*,NS |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@revans2 would be great if you could look at this part since you did Type checking stuff. outputs file https://github.com/NVIDIA/spark-rapids/pull/2904/files#diff-fffb7b35c0ad8bb096be63eecf71179426428a95063a6aab8772086f10bd4711 when run mvn verify