Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flint Spark index management API #1636

Merged

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented May 17, 2023

Description

  1. Draft Flint doc: https://github.com/dai-chen/sql-1/blob/add-flint-spark-api/flint/docs/index.md (put everything in one doc for now)
  2. Add high level Flint Spark API FlintSpark and FlintSparkSkippingIndex
  3. Add a basic Partition index PartitionSkippingStrategy and FlintClient.deleteIndex() for integration test

TODO

Implement Partition index building once Flint batch writer is ready

API Example

val flint = new FlintSpark(spark)

flint.skippingIndex()
    .onTable("alb_logs")
    .filterBy("time > 2023-04-01 00:00:00")
    .addPartitionIndex("year", "month", "day")
    .addValueListIndex("elb_status_code")
    .addBloomFilterIndex("client_ip")
    .create()

Please find more details in Flint doc, IT and detailed design in opensearch-project/opensearch-spark#2.

Issues Resolved

opensearch-project/opensearch-spark#2

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added enhancement New feature or request Flint labels May 17, 2023
@dai-chen dai-chen self-assigned this May 17, 2023
@codecov
Copy link

codecov bot commented May 17, 2023

Codecov Report

Merging #1636 (c8da035) into feature/flint (614d27a) will not change coverage.
The diff coverage is n/a.

❗ Current head c8da035 differs from pull request most recent head 039ef8e. Consider uploading reports for the commit 039ef8e to get more accurate results

@@               Coverage Diff                @@
##             feature/flint    opensearch-project/sql#1636   +/-   ##
================================================
  Coverage            97.19%   97.19%           
  Complexity            4107     4107           
================================================
  Files                  371      371           
  Lines                10464    10464           
  Branches               706      706           
================================================
  Hits                 10170    10170           
  Misses                 287      287           
  Partials                 7        7           
Flag Coverage Δ
sql-engine 97.19% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen changed the title Add Flint Spark API Add Flint Spark index management API May 18, 2023
dai-chen added 9 commits May 18, 2023 13:40
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen marked this pull request as ready for review May 22, 2023 21:44
@dai-chen dai-chen requested a review from penghuo May 22, 2023 21:45
Signed-off-by: Chen Dai <daichen@amazon.com>
/**
* Flint configurations in Spark. TODO: shared with Flint data source config?
*/
val FLINT_INDEX_STORE_LOCATION = "spark.flint.indexstore.location"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ, should we put all the flint spark related configuration in here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think we should put all together here or a new FlintSparkConf class later.

@dai-chen dai-chen merged commit 7268b5e into opensearch-project:feature/flint May 23, 2023
@dai-chen dai-chen deleted the add-flint-spark-api branch May 23, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Flint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants