Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled CLI to read predefined and customized CSV files #480

Merged
merged 10 commits into from
Jan 3, 2022
Merged

Enabled CLI to read predefined and customized CSV files #480

merged 10 commits into from
Jan 3, 2022

Conversation

lziq
Copy link
Contributor

@lziq lziq commented Dec 13, 2021

Issue #366

Description of changes:

  1. Enabled CLI to read predefined and customized CSV files.
  2. Wrote a few test cases.
  3. Added documentation.

@lziq lziq self-assigned this Dec 13, 2021
@lziq lziq linked an issue Dec 13, 2021 that may be closed by this pull request
@alancai98 alancai98 self-requested a review December 20, 2021 18:36
docs/user/CLI.md Outdated Show resolved Hide resolved
docs/user/CLI.md Outdated Show resolved Hide resolved
examples/build.gradle Outdated Show resolved Hide resolved
docs/user/CLI.md Outdated
read_file('customized.csv', {'type':'customized', 'delimiter':',', 'header':true, \
'ignore_empty_line':true, 'ignore_surrounding_space':true, 'trim':true, \
'line_breaker: \n', 'escape':'\', 'quote':'"'})
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to note what the arguments do or link to CSVParser's reference on these.

Also would be helpful to note that delimiter, escape, and quote arguments can only be a 1 character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in the next commit.

docs/user/CLI.md Outdated
```
read_file('customized.csv', {'type':'customized', 'delimiter':' ', 'header':true})
```
The following command explicitly shows all the available options for a standard CSV file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what is meant by "standard" here? Is there a standard CSV file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just means the default CSV format in CSVParser library.

cli/src/org/partiql/cli/functions/ReadFile.kt Outdated Show resolved Hide resolved
cli/test/org/partiql/cli/functions/ReadFileTest.kt Outdated Show resolved Hide resolved
lziq and others added 3 commits December 20, 2021 20:53
Co-authored-by: Alan Cai <caialan@amazon.com>
Co-authored-by: Alan Cai <caialan@amazon.com>
Co-authored-by: Alan Cai <caialan@amazon.com>
@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2021

Codecov Report

Merging #480 (f3af748) into main (aaa5936) will increase coverage by 0.01%.
The diff coverage is 89.28%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #480      +/-   ##
============================================
+ Coverage     82.43%   82.44%   +0.01%     
+ Complexity     1330     1329       -1     
============================================
  Files           171      171              
  Lines         10904    10923      +19     
  Branches       1785     1795      +10     
============================================
+ Hits           8989     9006      +17     
  Misses         1368     1368              
- Partials        547      549       +2     
Flag Coverage Δ
CLI 25.30% <88.88%> (+4.44%) ⬆️
EXAMPLES 75.14% <ø> (ø)
LANG 84.84% <100.00%> (+<0.01%) ⬆️
PTS ∅ <ø> (∅)
TEST_SCRIPT 79.68% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cli/src/org/partiql/cli/functions/ReadFile.kt 81.81% <88.88%> (+4.54%) ⬆️
...ng/src/org/partiql/lang/eval/io/DelimitedValues.kt 89.36% <100.00%> (+1.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aaa5936...f3af748. Read the comment docs.

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just have some minor comments and questions. Otherwise, looks good to me.

cli/src/org/partiql/cli/functions/ReadFile.kt Outdated Show resolved Hide resolved
Comment on lines 47 to 49
.let{ it.withIgnoreSurroundingSpaces(ignoreSurroundingSpace) }
.let{ it.withTrim(trim) }
.let { if (hasHeader) it.withFirstRecordAsHeader() else it }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry missed this in the initial review) nit: .let calls on these lines are redundant

Comment on lines 168 to 171
fun readPostgreCsvFile() {
writeFile("simple_postgre.csv", "id,name,balance\n1,Bob,10000.00")

val args = listOf("\"${dirPath("simple_postgre.csv")}\"", "{type:\"postgresql_csv\", header:true}").map { it.exprValue() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all the typos were addressed here. 'simple_postgre.csv' -> 'simple_postgresql.csv'

docs/user/CLI.md Outdated
Comment on lines 668 to 675
All the available options for customized CSV files are shown as following:
1. Ignore empty lines: `'ignore_empty_line':true`
2. Ignore spaces surrounding comma: `'ignore_surrounding_space':true`
3. Trim leading and trailing blanks: `'trim':true`
4. Set line breaker (only working with '\r', '\n' and '\r\n'): `'line_breaker: \n'`
5. Set escape sign (single character only): `'escape':'\'`
6. Set quote sign (single character only): `'quote':'"'`
7. Set delimiter sign (single character only): `'delimiter':','`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason for this set of customized CSV parsing options? I saw there were some other options CSVFormat supports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which other options? I think these are all the options to customize the format of a CSV file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the CSVFormat constructor and CSVFormat.Builder (documentation here), there are about 25 total configuration options like nullString, recordSeparator, quoteMode, skipHeaderRecord, etc.

Copy link
Contributor Author

@lziq lziq Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other options are not so important to configure CSV format. If we find we need to configure any one of them in the future, we can create a ticket for it and make corresponding enhancement.

@lziq
Copy link
Contributor Author

lziq commented Dec 23, 2021

Will create a new issue to add tests to read CSV files in other formats. Other comments were already applied.

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@lziq lziq merged commit 5044be1 into main Jan 3, 2022
@lziq lziq deleted the COW-CLI branch January 3, 2022 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PartiQL should use idiomatic CSV parsing
3 participants