Custom Examine indexer to index any umbraco media nodes. Under the hood it makes use of Apache Tika to extract content and meta data from umbraco media files. Tika can handle the following formats. The package also supports VPP (Virtual path provider) so if your media files are in azure etc it will also index those.
This package is supported on Umbraco 7.6.1+.
ExamineFileIndexer is available from Our Umbraco, NuGet, or as a manual download directly from GitHub.
You can find a downloadable package, along with a discussion forum for this package, on the Our Umbraco site.
To install from NuGet, run the following command in your instance of Visual Studio.
PM> Install-Package Cogworks.ExamineFileIndexer
After installation your ExamineIndex.config and ExamineSettings.config file will updated. The following entries will be added.
<IndexSet SetName="MediaIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/MediaIndexSet">
<IndexAttributeFields>
<add Name="id" />
<add Name="nodeName" />
<add Name="updateDate" />
<add Name="writerName" />
<add Name="path" />
<add Name="nodeTypeAlias" />
<add Name="parentID" />
</IndexAttributeFields>
<IncludeNodeTypes>
<add Name="File" />
</IncludeNodeTypes>
</IndexSet>
Under ExamineIndexProviders/providers:
<add name="MediaIndexer" type="Cogworks.ExamineFileIndexer.UmbracoMediaFileIndexer, Cogworks.ExamineFileIndexer"
extensions=".pdf,.docx"
umbracoFileProperty="umbracoFile" />
Under ExamineSearchProviders/providers:
<add name="MediaSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" indexSet="MediaIndexSet"
analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />
By default the following file types will be indexed: pdf, docx. To add other file types to index you need to update ExamineSettings.config:
<add name="MediaIndexer" type="Cogworks.ExamineFileIndexer.UmbracoMediaFileIndexer, Cogworks.ExamineFileIndexer"
extensions=".pdf,.docx"
umbracoFileProperty="umbracoFile" />
Update the extensions attribute and add any other file types. They need to be separated by colons (,).
You can also add the image file types eg. .jpg. PLEASE NOTE INDEXING IMAGES WILL ONLY ADD EXIF META DATA.
To raise a new bug, create an issue on the GitHub repository. To fix a bug or add new features, fork the repository and send a pull request with your changes. Feel free to add ideas to the repository's issues list if you would to discuss anything related to the package.
This project is maintained by Cogworks and contributors. If you have any questions about the project please contact us through the forum on Our Umbraco, on Twitter, or by raising an issue on GitHub.
Copyright © 2017 The Cogworks Ltd, and other contributors
Licensed under the MIT License.