-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the Faker connector #23691
Add the Faker connector #23691
Conversation
plugin/trino-faker/src/main/java/io/trino/plugin/faker/ColumnInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/ColumnInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/ColumnInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerColumnHandle.java
Outdated
Show resolved
Hide resolved
...rc/main/resources/docker/trino-product-tests/conf/environment/multinode-all/faker.properties
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSourceProvider.java
Outdated
Show resolved
Hide resolved
@raunaqmorarka thanks a lot for the review, everything looks much nicer now! |
Personally I think this would be a great and very useful connector to add. It would enable numerous use cases around learning Trino, testing, benchmarking and other efforts. Specifically it also goes beyond the rather narrow and stale setup from TPCH/TPCDS. |
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerSplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/TableInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/test/java/io/trino/plugin/faker/FakerQueryRunner.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/test/java/io/trino/plugin/faker/FakerQueryRunner.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/test/java/io/trino/plugin/faker/FakerQueryRunner.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSource.java
Outdated
Show resolved
Hide resolved
for (int i = 0; i < positions; i++) { | ||
generator.accept(blockBuilder, completedRows + i); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally what we want is
generator.accept(blockBuilder, completedRows, positions);
Generate N positions in a batch.
This is a megamorphic call site, so looping over it per position can be slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what's a megamorphic call site, can you share some references? This would change the interfaces a lot, and I'm not confident I can do this correctly, without understanding the optimization. Is this a blocker for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I was benchmarking it like this, comparing to the sequence
table function, and it was fast enough. I'm not sure what gains we can expect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short version is that when you have a method call that can be dispatched to more than 2 possible implementations, the JIT may not do certain optimizations and the code may end up being much slower than what it would have been had there been just 1 or 2 possible implementations.
Longer version in http://www.insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/
To observe the perf impact, you have to execute this code for 3 different data types and compare the throughput with what you would get if you only executed it on one data type.
This can be tackled in a follow-up, not important enough for current PR to tackle.
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerColumnHandle.java
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSinkProvider.java
Show resolved
Hide resolved
aa18f32
to
eb329b1
Compare
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerConnector.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerConnector.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerConnector.java
Outdated
Show resolved
Hide resolved
51bd83c
to
7039a53
Compare
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerConfig.java
Outdated
Show resolved
Hide resolved
@ScalarFunction | ||
@Description("Generate a random string based on the Faker expression") | ||
@SqlType(VARCHAR) | ||
public Slice randomExpression(@SqlType(VARCHAR) Slice expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generateRandom ?
126a8ca
to
fa4b30c
Compare
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/test/java/io/trino/plugin/faker/TestFakerQueries.java
Outdated
Show resolved
Hide resolved
plugin/trino-faker/src/main/java/io/trino/plugin/faker/FakerSplitManager.java
Outdated
Show resolved
Hide resolved
@nineinchnick please work with @mosabua to update docs for this |
Description
The purpose of this connector would be similar to tpch/tpcds - generate random data to be used for all kinds of testing. The main difference is that this works with all Trino data types and any kind of schema. It's also using the Datafaker library, which allows generating more sophisticated random data in multiple languages.
For more details, including usage examples, see https://github.com/nineinchnick/trino-faker/
I'll add documentation if the general idea of adding this connector will be approved.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: