-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way #4733
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
3d523e0
to
a4b63b5
Compare
Run Gluten Clickhouse CI |
4 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
616442e
to
0a32786
Compare
Run Gluten Clickhouse CI |
backends-clickhouse/src/main/scala/io/glutenproject/execution/GlutenMergeTreePartition.scala
Show resolved
Hide resolved
} | ||
|
||
def getFileFormat(meta: Metadata): DeltaMergeTreeFileFormat = { | ||
val fileFormat = new DeltaMergeTreeFileFormat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directly new DeltaMergeTreeFileFormat(
for returning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
object ClickHouseTableV2 extends Logging { | ||
val deltaLog2Table = mutable.HashMap[DeltaLog, ClickHouseTableV2]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider to use ConcurrentHashMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
override def fileFormat(metadata: Metadata = metadata): FileFormat = | ||
ClickHouseTableV2.deltaLog2Table(this).getFileFormat(metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems it can not use this way to get the ClickHouseTableV2
, because if there is no writing data operation in this spark session, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
optionalBucketSet: Option[BitSet], | ||
optionalNumCoalescedBuckets: Option[Int], | ||
disableBucketedScan: Boolean): Seq[InputPartition] = { | ||
val tableV2 = ClickHouseTableV2.deltaLog2Table(deltaLog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems it can not use this way to get the ClickHouseTableV2
, because if there is no writing data operation in this spark session, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
deltaScan.files | ||
.map( | ||
addFile => { | ||
val addFileAsKey = AddFileAsKey(addFile) | ||
ClickhouseSnapshot.fileStatusCache.get(addFileAsKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fileStatusCache
only cache the AddMergeTreeParts
but not reduce the time for listing from delta log ? the deltaScan.files
seems it will call select action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
LGTM |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
… in a more native delta way (apache#4733) * compile pass * spark 3.2 works * fix spark session restart issue * fix cache problem * add test case for spark.sql.sources.partitionOverwriteMode * fix ut on guava stats * fix file path problem * fix filesForScan * add keysample info * fix uri 2
… in a more native delta way (apache#4733) * compile pass * spark 3.2 works * fix spark session restart issue * fix cache problem * add test case for spark.sql.sources.partitionOverwriteMode * fix ut on guava stats * fix file path problem * fix filesForScan * add keysample info * fix uri 2
… in a more native delta way (apache#4733) * compile pass * spark 3.2 works * fix spark session restart issue * fix cache problem * add test case for spark.sql.sources.partitionOverwriteMode * fix ut on guava stats * fix file path problem * fix filesForScan * add keysample info * fix uri 2
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
(Fixes: #4732)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)