-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overly restrictive S3 limitations #107
Comments
Confirmation that both HTTPS URL forms yield the same result:
|
Assuming it is even possible to use the SDK without specifying the zone, not specifying the zone when multiple zones are valid opens up potential problems with frequently changing datasets, where one zone will have a given new file and another zone doesn't yet. That will potentially cause annoying errors for users. People will think ERDDAP is flaky and complain (or just grumble to themselves.) There is also a potential delay added when accessing files and not specifying the zone, as AWS will have to determine which zone to work with. Delays that involve signals travelling long distances can be significant. Presumably, the delay would occur each time the dataset or a file in it is accessed. The huge underlying problem with S3 is latency and this just exacerbates the problem. Specifying which zone to use is more straightforward and efficient. Even if not specifying the zone becomes an option, I would encourage admins to continue to specify the zone.
Both times above, I said Chris can choose to pursue this if he wants, but you can also pursue this: search the AWS v2 Java SDK docs for information about these issues and post them here. The documentation may also include recommendations about best practices. Best wishes. |
|
I've recently been using ERDDAP, version 2.23 using Axiom's Docker images in an attempt to load in HF Radar files with the data sections stored as CSV which are stored in S3.
Bucket is here and is currently publicly readable:
https://hfradar.s3.amazonaws.com/
I've run into several surprising behaviors.
ERDDAP docs do mention to use a particular availablility zone, e.g. instead of using https://hfradar.s3.amazonaws.com/SUBPATH/, use https://hfradar.s3.us-east-1.amazonaws.com/SUBPATH/ . However, the first form is usually what you'd get from copying from the AWS console, and also would support multiple availability zones. From all I can determine, the content served by both is the same barring downtime error conditions, but the former has the advantage of multi-AZ support. ERDDAP wants the other form and if you do not provide it, you will get an error message saying that no files were found. Docs do say to use a form with AZ specified, but this is confusing behavior given that the content is going to be the same in 99.9%+ of cases. I don't understand why the former form can't be supported, especially given that it's likely to be more robust in a production environment given the fact that multiple availability zones are supported.
Even though the bucket and its contents are publicly readable, ERDDAP will refuse to read the contents of the S3 bucket if you don't pass along credentials. Why is this step necessary for something that should already be readable without supplying AWS credentials?
The text was updated successfully, but these errors were encountered: