-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for specifying token directly #20
Comments
Hi @aersam thanks for reporting, there are some changes coming up to how duckdb manages credentials, when that gets merged, I will look into adding this to it |
It will be nice to have it , OneLake which is based on Azure uses token by default today DuckDB can't use it directly:( |
Hello, Just wondering if the issue is still open? |
Yes you have to renew it manually. Main use case is if you have a token in Python or so and want to use it, e.g. you could have a token from a user context in a python backend and want to pass that. In such cases the lifetime is not an issue, your Library in python would be doing that and just before executing something you would be updating the duckdb variable |
Ok, one more question the token come from a SPN, a manged id, a workload identity or env variable, no? |
Yea i agree with @quentingodeau, the implementation would be something along the lines of: class RawTokenCredential : public Azure::Core::Credentials::TokenCredential {
public:
RawTokenCredential(const string& token_name) : Azure::Core::Credentials::TokenCredential(token_name) {
}
Azure::Core::Credentials::AccessToken GetToken(
Azure::Core::Credentials::TokenRequestContext const& tokenRequestContext,
Azure::Core::Context const& context) const override {
return raw_token;
};
Azure::Core::Credentials::AccessToken raw_token;
}; But it is a little hacky and probably not desirable if one of the other credentials provider methods can be used. Note that the Azure SDK does not provide this |
Not very common, but sometimes required. I'd say it's just the more low-level approach for advanced use cases |
Also there are so many ways to use Microsoft's Entra ID that I don't think you want to handle every edge case |
it is common, for example today, I can't write to Fabric OneLake using DuckDB |
@djouallah do you known how Fabric authenticate ? Does it use app registration ? |
Just chiming in here, this is also standard usage at our company. Basically we do something analogous to DeviceCodeCredential and then store the results in a custom class. The code is very similar to what samansmink suggested above, except it also keeps the refresh_token and refreshes the access token whenever needed. The goal is to authenticate with a username/password, without having to either re-authenticate constantly or having to store username/password somewhere. Creating a service principal or managed identity per user is too difficult to manage/govern. I'm not up to the task of writing it in duck/c++ myself, we previously used python and adlfs to authenticate this way. But if Ican help with anything e.g., testing, I'd be happy to do so. |
Sorry I have been away a bit. I will try to see if I can find a way to automated some testing on this. |
Ok, but good that it's still on the radar. I'm missing support for user-assigned managed identities in duckdb currently, which I could workaround with the direct token support |
It looks like that could be a small change, so likely something I could contribute a PR for. In my case I hit a couple of issues with the current auth setup in the extension:
Those feel like a long-tail of edge cases so likely not something worth having built-in support for but something which would be nice to unblock by allowing custom access-token generation. Re: 'that feels like a hint that this is not a common path' -- in my experience it is actually fairly common to derive custom classes from from azure.core.credentials import AccessToken, TokenCredential
class StorageCredential(TokenCredential):
def get_token(self, *scopes: str, claims: Optional[str] = None, tenant_id: Optional[str] = None, **kwargs: Any) -> AccessToken:
return AccessToken(mssparkutils.credentials.getToken("Storage"), sys.maxsize) Couple of potential issues:
Is there a preference on how to solve those? |
@mmaitre314 we currently don't have a mechanism in duckdb to handle token expiry (yet) so that would probably be a place to start on this. Otherwise I think we can just add this and document the fact that manual secret refreshing is required. That way this can work as a workaround until we have proper secret expiration |
One workaround which works with the extension as-is, albeit a convoluted one:
User-delegation keys/SAS can live for up-to 7 days and it looks like DuckDB allows refreshing them using Python sample code using a mix of Managed Identity and Interactive Browser credentials: import duckdb
from datetime import datetime, timezone, timedelta
from azure.identity import ChainedTokenCredential, ManagedIdentityCredential, InteractiveBrowserCredential
from azure.storage.blob import BlobServiceClient, generate_container_sas
tenant_id='11111111-2222-3333-4444-555555555555'
account_name = "myaccount"
container_name = "mycontainer"
blob_path = "path/to/blobs/*.parquet"
credential = ChainedTokenCredential(ManagedIdentityCredential(), InteractiveBrowserCredential(tenant_id=tenant_id))
def create_user_delegation_sas() -> str:
start_time = datetime.now(timezone.utc)
expiry_time = start_time + timedelta(days=1)
client = BlobServiceClient(f"https://{account_name}.blob.core.windows.net", credential=credential)
return generate_container_sas(
account_name = account_name,
container_name = container_name,
user_delegation_key = client.get_user_delegation_key(key_start_time=start_time, key_expiry_time=expiry_time),
resource_types = "sco",
permission = "rl",
start = start_time,
expiry = expiry_time,
)
duckdb.sql(f"""
CREATE OR REPLACE SECRET {account_name} (
TYPE AZURE,
CONNECTION_STRING 'DefaultEndpointsProtocol=https;AccountName={account_name};EndpointSuffix=core.windows.net;SharedAccessSignature={create_user_delegation_sas()}',
SCOPE 'az://{account_name}.blob.core.windows.net/'
)
""")
duckdb.sql(f"SELECT COUNT(*) FROM 'az://{account_name}.blob.core.windows.net/{container_name}/{blob_path}'") |
Hi there
Thanks for this cool extension, that will enable lot's of use cases for us
If you acquire the token outside duckdb, would be nice to be able to do something like this:
This is espescially useful if you use Managed Identity / Interactive Browser Credentials or the like
The text was updated successfully, but these errors were encountered: