Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ConnectorMetadata to deprecate multi-layouts returning method #21933

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

hantangwangd
Copy link
Member

@hantangwangd hantangwangd commented Feb 14, 2024

Description

The support for multiple layouts has been removed in PR #12674, however the SPI method ConnectorMetadata.getTableLayouts() still return a List<ConnectorTableLayoutResult> which should always contains exactly one element. That's somehow unreasonable, and may lead in some confusion and inconvenience in the reading and subsequent development of the code. This PR try to deprecate this method, and replace it by getTableLayoutForConstraint() which explicitly return a ConnectorTableLayoutResult.

Test Plan

  • Make sure this refactor do not effect all existing test cases

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Deprecate SPI method ConnectorMetadata.getTableLayouts(), replace by ConnectorMetadata.getTableLayoutForConstraint()

@hantangwangd
Copy link
Member Author

Hi @tdcmeehan, sorry for the misoperation that closed the last PR, so I open this new PR.

I would try to ensure that the results between getTableLayouts and getTableLayoutForConstaint are consistent, otherwise it might be confusing to connector authors.

Remember that this method is also used by connectors not even known to Presto (i.e. external libraries). So we need to accomodate people who are slow to adopt to the new API.

I'm a little not quite understand about your comment. Even if using getTableLayoutForConstaint in presto engine's metadata manager, it would be compatible with the connectors from external libraries who implemented the old SPI method getTableLayouts. And the default implementation in getTableLayoutForConstaint would invoke the old SPI method and validate that it should always return a list contains only one element, so could this ensure that the results are consistent between getTableLayouts and getTableLayoutForConstaint? Or is there anything I have overlooked?

@tdcmeehan
Copy link
Contributor

@hantangwangd I see that the default method for getTableLayoutForConstraint is performing the check that only a single layout is returned, however in theory someone could override both getTableLayouts and getTableLayoutForConstraint. I'd keep an additional check in MetadataManager just to keep it fool proof.

@hantangwangd
Copy link
Member Author

@tdcmeehan Thanks for the explanation. Do you mean we should check that getTableLayoutForConstaint would never return a null result in MetadataManager? In case someone has overridden both getTableLayouts and getTableLayoutForConstraint, the function getTableLayouts would not work unless the newly customized getTableLayoutForConstraint invoke it, and getTableLayoutForConstaint would explicitly return a single ConnectorTableLayoutResult .

@tdcmeehan
Copy link
Contributor

@hantangwangd my concern is someone implements both and due to a misunderstanding, they are not alerted that they have improperly coded their connector. To avoid this, I believe the engine itself should check that these two methods are equivalent, until getTableLayouts is fully deprecated.

@hantangwangd
Copy link
Member Author

Hi @tdcmeehan, as I understand, the main situation you concerned about is that someone who has correctly implemented the deprecated function getTableLayouts might find and improperly implement the new SPI function getTableLayoutForConstaint, in that case the behavior would change and go wrong. Is it the point?

If we make getTableLayouts's default implementation return a singleton list of getTableLayoutForConstraint and still invoke getTableLayouts inside our engine and validate it's result, we might override and ignore the wrongly implementation of the new SPI function getTableLayoutForConstaint in above case. What's your opinion? Which one do you prefer as a better way?

@tdcmeehan
Copy link
Contributor

If we make getTableLayouts's default implementation return a singleton list of getTableLayoutForConstraint and still invoke getTableLayouts inside our engine and validate it's result, we might override and ignore the wrongly implementation of the new SPI function getTableLayoutForConstaint in above case. What's your opinion? Which one do you prefer as a better way?

The engine should ensure that whatever is returned from both methods is identical. i.e.:

ConnectorTableLayoutResult layoutForConstraintsResult = getTableLayoutForConstaint(...)
List<ConnectorTableLayoutResult> layoutsResult = getTableLayouts(...)
verify(layoutsResult.size() == 1)
checkState(layoutsResult.get(0).equals(layoutForConstraintsResult)

@hantangwangd
Copy link
Member Author

The engine should ensure that whatever is returned from both methods is identical.

OK, I understand, that makes sense.

@hantangwangd
Copy link
Member Author

Hi @tdcmeehan, after check the code, I found it's a little complex to determine the equality of two ConnectorTableLayoutResult objects. It contains several fields related to SPI interface such as ColumnHandle, ConnectorTablePartitioningHandle and ConnectorTableLayoutHandle which would be implemented by various connectors. So that the equals logic of ConnectorTableLayoutResult would rely on specific connection implementations. This is somewhat risky, as the function equals is not mandatory to be overwritten. For example, hive do not overwrite function equals for its HiveTableLayoutHandle, and the same goes for LocalFileTableLayoutHandle and TpcdsTableLayoutHandle etc..

In that case, if some one implements both the methods which return different objects that are semantically identical, they still may fail the check depending on specific connector implementation. So what's your opinion about this? Would it be better for the connectors to take on their own responsibility to make the correct SPI function implementation?

@tdcmeehan
Copy link
Contributor

@hantangwangd I hadn't considered that. I think given that constraint, what's written here is the best we can do.

@hantangwangd hantangwangd merged commit c23d1bd into prestodb:master Mar 6, 2024
56 checks passed
@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants