Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to use AWS env variables but defaults to GCS #78

Open
alberttwong opened this issue Jul 15, 2024 · 1 comment
Open

Trying to use AWS env variables but defaults to GCS #78

alberttwong opened this issue Jul 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@alberttwong
Copy link
Contributor

alberttwong commented Jul 15, 2024

environment:
docker compose with openjdk 11, minio, xtable, spark 3.4, hive 2.3.10, hadoop 2.10.2

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:01:44.376 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
17:01:44.496 [main] INFO  com.onehouse.RuntimeModule - Spinning up 70 threads
17:01:44.657 [main] INFO  com.onehouse.metrics.MetricsServer - Starting metrics server
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:01:44.864 [metadata-extractor-1] ERROR c.o.m.TableDiscoveryService - Failed to discover tables in path: s3://warehouse/people
17:01:44.865 [metadata-extractor-1] ERROR c.o.m.TableDiscoveryService - com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
java.util.concurrent.CompletionException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
        at com.google.cloud.storage.StorageException.translate(StorageException.java:118)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:287)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:430)
        at com.google.cloud.storage.StorageImpl.lambda$listBlobs$11(StorageImpl.java:397)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
        at com.google.cloud.storage.Retrying.run(Retrying.java:54)
        at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:394)
        at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:365)
        at com.onehouse.storage.GCSAsyncStorageClient.lambda$fetchObjectsByPage$1(GCSAsyncStorageClient.java:61)
        ... 7 common frames omitted
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 401 Unauthorized
GET https://storage.googleapis.com/storage/v1/b/warehouse/o?delimiter=/&prefix=people/&projection=full
{
  "code" : 401,
  "errors" : [ {
    "domain" : "global",
    "location" : "Authorization",
    "locationType" : "header",
    "message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).",
    "reason" : "required"
  } ],
  "message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:525)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:420)
        ... 15 common frames omitted
17:01:44.866 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: []
17:01:44.867 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:01:44.867 [main] INFO  com.onehouse.metrics.MetricsServer - Shutting down metrics server
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
#        region: us-west-2
#        accessKey: admin
#        accessSecret: password

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed
export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000
export AWS_REGION=us-east-1
@alberttwong
Copy link
Contributor Author

If you change to not use OS variables and define them in YAML.

root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed

then I get this error

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:05:25.080 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
Exception in thread "main" java.lang.RuntimeException: Failed to load config
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:31)
        at com.onehouse.Main.loadConfig(Main.java:92)
        at com.onehouse.Main.start(Main.java:56)
        at com.onehouse.Main.main(Main.java:41)
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "endpoint" (class com.onehouse.config.models.common.S3Config$S3ConfigBuilder), not marked as ignorable (3 known properties: "accessKey", "region", "accessSecret"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: com.onehouse.config.models.configv1.ConfigV1$ConfigV1Builder["fileSystemConfiguration"]->com.onehouse.config.models.common.FileSystemConfiguration$FileSystemConfigurationBuilder["s3Config"]->com.onehouse.config.models.common.S3Config$S3ConfigBuilder["endpoint"])
        at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
        at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:298)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
        at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
        at com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
        at com.onehouse.config.ConfigLoader.loadConfigFromJsonNode(ConfigLoader.java:47)
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:29)
        ... 3 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants