Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need support for AWS Endpoint and forcePathStyle to support MINIO and/or local development #79

Open
alberttwong opened this issue Jul 15, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@alberttwong
Copy link
Contributor

alberttwong commented Jul 15, 2024

environment:
docker compose with openjdk 11, minio, xtable, spark 3.4, hive 2.3.10, hadoop 2.10.2

          If you change to not use OS variables and define them in YAML.
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed

then I get this error

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:05:25.080 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
Exception in thread "main" java.lang.RuntimeException: Failed to load config
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:31)
        at com.onehouse.Main.loadConfig(Main.java:92)
        at com.onehouse.Main.start(Main.java:56)
        at com.onehouse.Main.main(Main.java:41)
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "endpoint" (class com.onehouse.config.models.common.S3Config$S3ConfigBuilder), not marked as ignorable (3 known properties: "accessKey", "region", "accessSecret"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: com.onehouse.config.models.configv1.ConfigV1$ConfigV1Builder["fileSystemConfiguration"]->com.onehouse.config.models.common.FileSystemConfiguration$FileSystemConfigurationBuilder["s3Config"]->com.onehouse.config.models.common.S3Config$S3ConfigBuilder["endpoint"])
        at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
        at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:298)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
        at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
        at com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
        at com.onehouse.config.ConfigLoader.loadConfigFromJsonNode(ConfigLoader.java:47)
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:29)
        ... 3 more
export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000
export AWS_REGION=us-east-1

Originally posted by @alberttwong in #78 (comment)

@alberttwong
Copy link
Contributor Author

alberttwong commented Jul 15, 2024

PR submitted. #85

@alberttwong
Copy link
Contributor Author

Using the new PR

root@spark:/opt/LakeView# java -jar LakeView-1.0-SNAPSHOT-all.jar -p '/opt/LakeView/delta.yaml' 
17:53:05.956 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
17:53:06.083 [main] INFO  com.onehouse.RuntimeModule - Spinning up 70 threads
17:53:06.373 [main] INFO  com.onehouse.metrics.MetricsServer - Starting metrics server
17:53:06.386 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:53:06.386 [main] INFO  c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:53:06.387 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:53:06.555 [metadata-extractor-2] INFO  c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=null)]
17:53:06.557 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Fetching checkpoint for tables: [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)]
17:53:06.943 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Initializing following tables [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)]
17:53:07.218 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:53:07.231 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - Processing 1 instants in table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline COMMIT_TIMELINE_TYPE_ARCHIVED sequentially in 1 batches
17:53:07.231 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - uploading batch 1 for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:53:07.618 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:53:07.641 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - Processing 3 instants in table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline COMMIT_TIMELINE_TYPE_ACTIVE sequentially in 1 batches
17:53:07.641 [metadata-extractor-3] INFO  c.o.m.TimelineCommitInstantsUploader - uploading batch 2 for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:53:07.993 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - Reached end of instants in COMMIT_TIMELINE_TYPE_ACTIVE for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)
17:53:07.995 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:53:07.996 [main] INFO  com.onehouse.metrics.MetricsServer - Shutting down metrics server
root@spark:/opt/LakeView# ls
delta.yaml  LakeView-1.0-SNAPSHOT-all.jar
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: XXXX== 
    apiSecret: YYYYYY=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000
        forcePathStyle: true

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed

@alberttwong alberttwong changed the title Need support for AWS Endpoint to support MINIO Need support for AWS Endpoint and forcePathStyle to support MINIO Jul 15, 2024
@alberttwong alberttwong changed the title Need support for AWS Endpoint and forcePathStyle to support MINIO Need support for AWS Endpoint and forcePathStyle to support MINIO and/or local development Jul 31, 2024
@andywalner andywalner added the enhancement New feature or request label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants