Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert EMR info in Flint metadata #142

Merged

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Nov 8, 2023

Description

Revert previous changes https://github.com/opensearch-project/opensearch-spark/pull/125/files#diff-3ba13ed226c6bfc748e11ae7c79ad2b47783ab2a16805dfdd142818fa9796c70 because PPL plugin still assumes these 2 fields present when deserialization.

Without it, drop statement will throw syntax exception:

$ curl localhost:9200/_plugins/_async_query -X POST -H "Content-Type: application/json" -d'
{
    "datasource" : "glue_1",
    "lang" : "sql",
    "query" : "DROP INDEX chen_all_columns ON mys3.default.http_logs"
}'
{
  "status": 400,
  "error": {
    "type": "IllegalArgumentException",
    "reason": "Invalid Request",
    "details": "Provided Index doesn\u0027t exist"
  }
}

Test

Mock EMR env variable and create skipping index:

$ export SERVERLESS_EMR_VIRTUAL_CLUSTER_ID=test_app
$ export SERVERLESS_EMR_JOB_ID=test_job
$ spark-shell  ...

scala> System.getenv("SERVERLESS_EMR_VIRTUAL_CLUSTER_ID")
res0: String = test_app
scala> System.getenv("SERVERLESS_EMR_JOB_ID")
res1: String = test_job

scala> spark.sql("""
     | CREATE SKIPPING INDEX ON stream.lineitem_tiny
     | (l_shipdate VALUE_SET)
     | WITH (
     |   auto_refresh = true,
     |   checkpoint_location = "s3://checkpoint")
     | """)

{
  "flint_myglue_stream_lineitem_tiny_skipping_index": {
    "mappings": {
      "_meta": {
        "latestId": "ZmxpbnRfbXlnbHVlX3N0cmVhbV9saW5laXRlbV90aW55X3NraXBwaW5nX2luZGV4",
        "kind": "skipping",
        "indexedColumns": [
          {
            "columnType": "date",
            "kind": "VALUE_SET",
            "columnName": "l_shipdate"
          }
        ],
        "name": "flint_myglue_stream_lineitem_tiny_skipping_index",
        "options": {
          "auto_refresh": "true",
          "checkpoint_location": "s3://chen-emr-test/checkpoints/job-14"
        },
        "source": "myglue.stream.lineitem_tiny",
        "version": "0.1.0",
        "properties": {
          "env": {
            "SERVERLESS_EMR_VIRTUAL_CLUSTER_ID": "test_app",
            "SERVERLESS_EMR_JOB_ID": "test_job"
          }
        }
      },
      ......
    }
  }
}

Issues Resolved

opensearch-project/sql#2424

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added the bug Something isn't working label Nov 8, 2023
@dai-chen dai-chen self-assigned this Nov 8, 2023
@dai-chen dai-chen marked this pull request as ready for review November 8, 2023 02:08
@penghuo penghuo merged commit d5e6738 into opensearch-project:main Nov 8, 2023
4 of 5 checks passed
@dai-chen dai-chen deleted the rollback-flint-metadata-emr-env branch November 8, 2023 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants