-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cognitive Search Indexer Performance with Azure.Search.Documents #18060
Comments
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
I have a little more info on this. I can get it to occur when adding incremental files also, not just on an indexer reset. Using this code whenever a file is added to a blob storage account:
} I'm also attaching an image of the performance runs. From the bottom there was the following sent to be indexed.
You can see that #2 and #3 took way, way too long to complete. 12 minutes vs. 5 seconds. All of the image files are about the same size. It is interesting to note that the number of docs succeeded is being reported as 2x the number of docs placed in blob storage for indexing. I confirmed this by examining the number of time the function was actually called. #3 has 4 docs succeeding, this was two documents sent to be indexed at once, and the indexer was called twice for that, however for all of the other files, they were placed to be indexed one at a time. The warnings are either about truncation of long documents or that some skills could not run, generally text based skills on images. Also, I added a specific cognitive services account, instead of using the free runs, as the double counting we effecting the amount I could index. However, this did not change the performance at all |
@Mohit-Chakraborty can you take a look at this? Also /cc @brjohnstmsft in case this sounds familiar. |
@AlexGhiondea @Mohit-Chakraborty For indexer-related issues, please engage @bleroy |
@snapfisher Are you able to reproduce this with the latest library as well! /cc @bleroy |
I don't know, but I can try and take a look next week sometime. |
Hi, This doesn't look like it's caused by the SDK, but looks to be more about inconsistencies in indexer running times. Did you try comparing the exact same document when manually creating the indexer from the portal versus through the script on its own, vs. running it with the script as part of a batch? |
The exact same documents were tried in both scenarios.
I'm fairly certain it was not the free tier, but the one above (as that is where I have it running now). I tried it in a couple of tiers, I think, and got consistent results. It was a bit ago, so don't recall those details.
The thing to understand is that it was not variable. Every time I ran the scenarios I got the same results. I could never get it to work correctly. Before we went to the new libraries, I had the same setup deployed, and did not have this issue.
On Sep 16, 2021 7:20 PM, Bertrand Le Roy ***@***.***> wrote:
Hi,
This doesn't look like it's caused by the SDK, but looks to be more about inconsistencies in indexer running times. Did you try comparing the exact same document when manually creating the indexer from the portal versus through the script on its own, vs. running it with the script as part of a batch?
It would also be interesting to know what service tier you're running this on: the free tiers in particular, being multi-tenant, can show wide variations in indexing times.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#18060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AK36DPMG5I4Y75LXK33ROPTUCJ3UHANCNFSM4WJRPFQA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Ah, interesting, thanks for the info. So if you run the same job with the older version, you're not seeing the issue? If you can, it would be interesting to compare running manually from the portal and/or using the REST API. |
The old job can't be run any more. It's now fully deprecated. I tried a few months back with old packages and I had a fatal mismatch. I did not save those executables.
BTW, now that you are on this... I'm OOF tomorrow, but next week, I can give you the code, if it helps. It's really just needs two azure functions and a storage account in addition to search.
On Sep 16, 2021 8:01 PM, Bertrand Le Roy ***@***.***> wrote:
Ah, interesting, thanks for the info. So if you run the same job with the older version, you're not seeing the issue? If you can, it would be interesting to compare running manually from the portal and/or using the REST API.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#18060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AK36DPPSNUZA2QGWAX7FLW3UCKAO5ANCNFSM4WJRPFQA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Yes, that would be helpful. We need to remove variables, so comparing with other ways to run the same indexers will really help figure out if this is really a SDK issue or something else. Thanks. |
Not repro any more. |
Describe the bug
Using Azure.Search.Documents 11.2.0-beta.2, I can add 4 individual files to be indexed, and it takes about 10 seconds a file. If I recreate the index and indexer against a storage account with all 4 files present, it takes over 7 minutes to complete indexing. There are no errors given.
Expected behavior
I would expect the time to index to be roughly the same whether the files are added one at a time, or are present for the initial indexing. This is way out of the grey area. I tried with 10 files and it took 20 minutes.
Actual behavior (include Exception or Stack Trace)
No errors are produced. The indexer just runs forever, and then eventually succeeds. This is for a demo application, so indexing is controlled through two azure functions. The second is bound to blob storage and causes the indexer to update when a new file is added. The first accepts an http command to start. It then deletes the current index, indexer, storageconnection, and skillset and completely recreates everything
To Reproduce
Steps to reproduce the behavior (include a code snippet, screenshot, or any additional information that might help us reproduce the issue)
Here is the problematic function, that does the full reindexing on command:
Environment:
Azure.SearchDocuments 11.2.0-beta.2
dotnet --info
.NET SDK (reflecting any global.json):
Version: 5.0.102
Commit: 71365b4d42
Runtime Environment:
OS Name: Windows
OS Version: 10.0.19042
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\5.0.102\
Host (useful for support):
Version: 5.0.2
Commit: cb5f173b96
.NET SDKs installed:
1.1.13 [C:\Program Files\dotnet\sdk]
1.1.14 [C:\Program Files\dotnet\sdk]
2.1.617 [C:\Program Files\dotnet\sdk]
2.1.700 [C:\Program Files\dotnet\sdk]
2.1.701 [C:\Program Files\dotnet\sdk]
2.1.812 [C:\Program Files\dotnet\sdk]
2.2.300 [C:\Program Files\dotnet\sdk]
3.1.300 [C:\Program Files\dotnet\sdk]
3.1.405 [C:\Program Files\dotnet\sdk]
5.0.102 [C:\Program Files\dotnet\sdk]
.NET runtimes installed:
Microsoft.AspNetCore.All 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 1.0.15 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 1.0.16 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 1.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 1.1.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.24 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 3.1.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 5.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
To install additional .NET runtimes or SDKs:
https://aka.ms/dotnet-download
The text was updated successfully, but these errors were encountered: