Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to query multiple files on Azure using container level sas token #67

Open
2 tasks done
erik-farmer opened this issue Jul 24, 2024 · 0 comments
Open
2 tasks done

Comments

@erik-farmer
Copy link

What happens?

When querying for multiple files az://.blob.core.windows.net//path/to/blobs/*.json an exception is raised:

duckdb.duckdb.IOException: IO Error: AzureStorageFileSystem Read to az://<account>.blob.core.windows.net/<container>/path/to/blobs/*.json failed with NoAuthenticationInformation Reason Phrase: Server failed to authenticate the request. Please refer to the information in the www-authenticate header.

This exception is not raised when pointing to a specific blob (see example)

The SAS token is created using the following guide:
https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens?tabs=Containers

and all permissions are clicked (read/write/list/etc)

To Reproduce

Method 1

import duckdb
from adlfs.spec import AzureBlobFileSystem


fs = AzureBlobFileSystem(
            account_name='',
            container_name='',  # tried with and without this param
            sas_token='mySasToken',
        )
print(fs.glob("<container_name>/")). # works
print(fs.ls("<container_name>/")). # works
connection = duckdb.connect()
connection.register_filesystem(fs)

data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/specificFile.json');
""")# works

data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/multiple/files/*.json');
""") # raises IOException

Method 2

import duckdb


duckdb.execute("""
INSTALL azure;
LOAD azure;
""")

duckdb.execute("""
CREATE SECRET secret1 (
TYPE AZURE,
CONNECTION_STRING 'mySasToken'
);
""")

connection = duckdb.connect()
data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/specificFile.json');
""")
``` # works

data = connection.sql("""
SELECT *
FROM read_json('az://<account_name>.blob.core.windows.net/<container>/path/to/multiple/files/*.json');
""")
``` # raises IOException

OS:

arm64 (Apple M1)

DuckDB Version:

0.10.2

DuckDB Client:

Python

Full Name:

Erik Farmer

Affiliation:

PepsiCo

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@hannes hannes transferred this issue from duckdb/duckdb Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant