Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring and optimisation of RestoreTableCommand #912

Closed

Conversation

Maks-D
Copy link
Contributor

@Maks-D Maks-D commented Jan 26, 2022

  • RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
  • cache() of filesToRemove and filesToAdd DataFame removed (according to Scala API for restoring delta table #863 (comment)). Without cache computation 2x faster (tested on restoring table with 50k files)
  • added better job description for spark UI
    image

Signed-off-by: Maksym Dovhal maksym.dovhal@gmail.com

 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <maksym.dovhal@gmail.com>
@Maks-D
Copy link
Contributor Author

Maks-D commented Jan 30, 2022

@vkorukanti Could you take a look on updates. I've updated PR description accordingly.

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments. Thanks for measuring the impact of the cache.

@Maks-D
Copy link
Contributor Author

Maks-D commented Jan 31, 2022

@vkorukanti Thank you for review. I've updated PR according to review comments.

allisonport-db pushed a commit that referenced this pull request Feb 4, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to #863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <maksym.dovhal@gmail.com>

Closes #912

Signed-off-by: Venki Korukanti <venki.korukanti@databricks.com>
GitOrigin-RevId: b10707c96766f74423874f01898587f97c69c6b5
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <maksym.dovhal@gmail.com>

Closes delta-io#912

Signed-off-by: Venki Korukanti <venki.korukanti@databricks.com>
GitOrigin-RevId: b10707c96766f74423874f01898587f97c69c6b5
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <maksym.dovhal@gmail.com>

Closes delta-io#912

Signed-off-by: Venki Korukanti <venki.korukanti@databricks.com>
GitOrigin-RevId: b10707c96766f74423874f01898587f97c69c6b5
@Maks-D Maks-D deleted the refactoring_of_RestoreTableCommand branch March 27, 2023 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants