-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to specify structure hints in schema inference #40068
Conversation
} | ||
catch (...) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there can be an exception? Is some tryLogCurrentException
needed? or at least LOG_TRACE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can throw an exception when given structure
is a valid columns declaration list from the parser's point of view, but contains invalid types. Examples: x Tuple
, x Decimal(999)
I don't think we need to log this exception, as this tryParse*
function suppose to work with invalid structures as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then may be it makes sense to add tryGetColumnsDescription
(not getColumnsDescription
) in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will check if it's possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked getColumnsDescription
implementation and I don't think it's worth rewriting it to implement tryGetColumnsDescription
, because it will make this function more complex and in some places it will require try/catch construction. Let's keep at as is.
@@ -104,6 +115,26 @@ NamesAndTypesList IRowSchemaReader::readSchema() | |||
"Most likely setting input_format_max_rows_to_read_for_schema_inference is set to 0"); | |||
|
|||
DataTypes data_types = readRowAndGetDataTypes(); | |||
/// If column names weren't set, use default names 'c1', 'c2', ... | |||
if (column_names.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case it is empty or non-empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's non-empty in two cases:
- Format contains names and they were explicitely set using
setColumnNames
method. For example in formats with suffixWithNames
:
setColumnNames(names); - Column names were set using special settig
column_names_for_schema_inference
.
c4629a1
to
936c457
Compare
@kssenii Can we merge it? |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new setting
schema_inference_hints
that allows to specify structure hints in schema inference for specific columns. Closes #39569