Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 #382

Closed
konius opened this issue Apr 26, 2024 · 9 comments
Closed

Comments

@konius
Copy link

konius commented Apr 26, 2024

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

11.3.1

Bug summary

Examine index gets corrupt and can't view or manage the Examine dashboard and any content trying to read index for display purpose becomes empty.

Happens on version 11.3.2, but also on 13.1.1 with the only solution to complete delete Examine folder and restart the application.

Issue is already discussed on Our.

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 (resource: BufferedChecksumIndexInput(SimpleFSIndexInput(path="C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_vd")))
   at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName)
   at Lucene.Net.Index.IndexFileDeleter..ctor(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, InfoStream infoStream, IndexWriter writer, Boolean initialIndexExists)
   at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
   at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Umbraco.Cms.Infrastructure.Examine.ConfigurationEnabledDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass1_0.<.ctor>b__0()
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
--- End of stack trace from previous location ---
   at System.Lazy`1.CreateValue()
   at Examine.Lucene.Providers.LuceneIndex.PerformIndexItemsInternal(IEnumerable`1 valueSets, CancellationToken cancellationToken)
   at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass49_0.<PerformIndexItems>b__0()
   at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass73_0.<QueueTask>b__0(Task x)

image

Specifics

For an unknown reason the index gets corrupt and bricks the back office dashboard.

Application is hosted on Azure and config is applied as per this guide: https://docs.umbraco.com/umbraco-cms/v/10.latest-lts/fundamentals/setup/server-setup/azure-web-apps

Steps to reproduce

N/A

Expected result / actual result

Expected to be able to at least view the dashboard and rebuild indexes if they get corrupt.

Original post on Umbraco-CMS.

@Shazwazza
Copy link
Owner

Shazwazza commented Apr 26, 2024

Thanks for reporting. The answer to this will be 'it depends on a lot of things'.

I would strongly encourage you to fully understand the challenges of Lucene in Azure, I did a full talk on this at CodeGarden: https://youtu.be/qXKGVjTlEOk?si=uq7UQ9J5Ka4lTp-j

This and similar issues could occur depending on:

  • How you are doing deployments
  • If you are running via zip deployments
  • If you are doing slot swapping
  • If you are load balancing and have some misconfigurations (most common)

This PR will fix the slot swapping issue umbraco/Umbraco-CMS#15571 which is part of Umbraco 13.2 There's a good long thread on a related issue here too umbraco/Umbraco-CMS#15783 but I believe that thread is fixed with umbraco/Umbraco-CMS#15571.

As for the error, the top part of the error is what is important:

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 (resource: BufferedChecksumIndexInput(SimpleFSIndexInput(path="C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_vd")))
   at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName)
   at Lucene.Net.Index.IndexFileDeleter..ctor(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, InfoStream infoStream, IndexWriter writer, Boolean initialIndexExists)
   at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
   at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory

I've described what the SyncedFileSystemDirectoryFactory is here: umbraco/Umbraco-CMS#15783 (comment)

... Most importantly - this setting can ONLY be applied to the primary node, it cannot be used on several nodes. If you are slot swapping the primary node, than you are in fact load balancing and that means that this setting is being used by 2x nodes. As above - 'it depends on a lot of things'.

What could be done in Examine since the SyncedFileSystemDirectoryFactory is part of this codebase (but only Umbraco uses it), is to add some better error handling and diagnostics to output what might be going on. Potentially this is an issue with the files in the main storage and not the local temp storage, but without adding this info to the log, we won't know so that is something I can look into.

Many of these reasons is why ExamineX was created.

@binraider
Copy link

binraider commented Jun 10, 2024

We get this on an Umbraco 13.1.0 website if we leave the site alone and dont do anything for a while.
We deploy via Devops /CI to a linux webapp.
There are hardly any content changes on the site - just a few news items here and there.
The app plan is a p0v3.
Everything is fine for the first few weeks after a deployment, and then the examine back office drops off, and the FE search stops working.
Its a single app and the settings are:
"MainDomLock": "FileSystemMainDomLock",
"LocalTempStorageLocation": "EnvironmentTemp"
"LuceneDirectoryFactory": "SyncedTempFileSystemDirectoryFactory"
With azure blob storage
We have quite a few sites with similar setups that dont emit this behaviour, so its puzzling.

Funnily enough our usual deployment process is to

  • start the preprod slot
  • deploy the code to it
  • smoke test
  • swap slots
  • turn off the preprod slot

But in this case (by accident) our infra guy has set it up to deploy direct, so that is not in play here.

@Shazwazza
Copy link
Owner

@binraider Its hard to say why this would happen. Do you have any logs or anything that can help? This is really an Umbraco specific thing even though the code that Umbraco uses is in this repo. As above, the only thing I can do at this stage would be to add more logging and checks to see how in sync the main storage is vs the local file storage - but at this moment, I don't have the time to do this. One suggestion would be to use TempFileSystemDirectoryFactory instead of SyncedTempFileSystemDirectoryFactory and pay the overhead of index rebuilding when appservices moves your site and see if that resolves the problem. If it does, than its an issue with syncing or having corrupt files in the main storage. see https://docs.umbraco.com/umbraco-cms/reference/configuration/examinesettings

@binraider
Copy link

The Umbraco logs mirror the OP's errors largely:

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1

I will change the factory to see if it makes a difference.

@Shazwazza
Copy link
Owner

Umbraco has it's own TempFileSystemDirectoryFactory which i think should be used if you are using the non 'Synced' directory factory: https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Examine.Lucene/UmbracoTempEnvFileSystemDirectoryFactory.cs

@paulsterling
Copy link

Just adding a note that we see this same issue with our Umbraco Cloud (13.4.0) sites using the default configuration. Not surprising as Cloud uses Azure App Services.

@Shazwazza
Copy link
Owner

Shazwazza commented Jun 25, 2024

regarding slot swapping - this was created/fixed in Umbraco: https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Examine.Lucene/UmbracoTempEnvFileSystemDirectoryFactory.cs with Umbraco PR umbraco/Umbraco-CMS#15571

which takes into account the site Id - perhaps, part of this needs to be ported to the built-in SyncedTempFileSystemDirectoryFactory. I'm not sure off the top of my head but presumably since SyncedTempFileSystemDirectoryFactory doesn't know anything about Umbraco's own UmbracoTempEnvFileSystemDirectoryFactory to sync locally too, it might be part of the solution

... ah, but i see, this is already taken into account since Umbraco creates the SyncedTempFileSystemDirectoryFactory itself here https://github.com/umbraco/Umbraco-CMS/pull/15571/files#diff-5bf01722a10a69e85bb95802d6e0e7e4baf9b8e6a4abd1caf184f8cdc73c0124R29-R43

@Shazwazza
Copy link
Owner

Examine 3.3.0 has been published, release notes are here https://github.com/Shazwazza/Examine/releases/tag/v3.3.0

@davidpipkin
Copy link

@Shazwazza even after upgrading Examine to 3.3.0 I am getting this error. I'm on Umbraco 13.4.1 running within Azure. It only happened once so far and I cleared the Examine files. Should I have cleared them after updating Examine, could that have been the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants