Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUERY] Workaround for Java Data Lake SDK 5000 max results limit? #20634

Closed
2 tasks done
al-v-in opened this issue Apr 13, 2021 · 5 comments · Fixed by #22453
Closed
2 tasks done

[QUERY] Workaround for Java Data Lake SDK 5000 max results limit? #20634

al-v-in opened this issue Apr 13, 2021 · 5 comments · Fixed by #22453
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Docs Storage Storage Service (Queues, Blobs, Files)

Comments

@al-v-in
Copy link

al-v-in commented Apr 13, 2021

Query/Question
The List function will return a maximum of 5000 results, as stated here: https://docs.microsoft.com/en-us/java/api/com.azure.storage.file.datalake.models.listpathsoptions?view=azure-java-stable

Assuming I have a directory that has more than 5000 files, how can I list them all?

The rest API for "list" (https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/list) has a continuation token to get around this which doesn't seem to be exposed in the Java SDK. Is that correct?

If it's true that I can't use the Java Data Lake SDK for this, what should I use, the rest API directly?

Why is this not a Bug or a feature Request?
Because before I make a feature request I want to make sure I've not missed something obvious.

Setup (please complete the following information if applicable):
Not tested in code yet, just reviewing options from looking at the documentation.
com.azure:azure-storage-file-datalake:12.4.1

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Query Added
  • Setup information Added
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 13, 2021
@joshfree joshfree added Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Apr 16, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 16, 2021
@joshfree
Copy link
Member

@gapra-msft could you please take a look

@gapra-msft
Copy link
Member

Hi @al-v-in

Thank you for your question. I will take a look at the behavior of listPaths and let you know

@gapra-msft
Copy link
Member

Hi @al-v-in

The listPaths API should support automatic pagination by continuation token for you if you iterate over the PagedIterable returned, are you only seeing 5000 paths?

@al-v-in
Copy link
Author

al-v-in commented Apr 16, 2021

Thank you, @gapra-msft! I thought I must of have been missing something. As I mentioned I haven't tested in code, I was trying to figure out which method was best to interact with the data lake for our requirements.

I'm not sure if you would agree that it would be appropriate in this case, but perhaps it would clarify the matter if somewhere in the ListPathsOptions Class documentation it said something along the lines of "...max result of 5000 items per page"?

@gapra-msft
Copy link
Member

Yeah! I can make this a docs bug and we will pick it up when we have time!

@gapra-msft gapra-msft added bug This issue requires a change to an existing behavior in the product in order to be resolved. Docs and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that bug This issue requires a change to an existing behavior in the product in order to be resolved. labels Apr 19, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Docs Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants