Skip to content

Indexer configuration

Vladimir Kotal edited this page May 30, 2024 · 49 revisions

Indexer tunables

This is a list of the most common configuration options with their defaults which are not available as an indexer switch. These options are particularly handy when working with https://github.com/oracle/opengrok/wiki/Read-only-configuration

Tunable Type Meaning
annotationCacheEnabled boolean generate annotation cache during reindex. Note that this significantly increases indexing time. Disabled by default. Can be overridden on per project level.
fetchHistoryWhenNotInCache boolean avoid generating history for individual files when indexing. This prevents excess commands if the history log for top-level directory of a project cannot be retrieved for some reason (e.g. the command to get the history log fails) at the cost of not having history for given project available (if projects are enabled). This is specific to SCMs that can generate history for directories (e.g. Mercurial, Git, ...).
ctags String path to Universal ctags binary. If the path is not specified in configuration, the system path will be searched and if no Universal ctags binary is found, the system property org.opengrok.indexer.analysis.Ctags will be checked and will be used if non empty.
ctagsTimeout long timeout in seconds for processing of a file by the ctags program; if the definitions are not returned within this timeout the ctags process is terminated, default 10 seconds
disabledRepositories list of disabled repository types disable given repository types. This can also be set with the --disableRepository Indexer option. When setting this up in configuration, use the add method for strings.
indexerCommandTimeout int timeout (in seconds) of external commands executed during indexing (since version 1.4.4)
indexingParallelism int number of threads to create in the second phase of indexing when actual index is created. Can be also set via -T indexer option. Directly influences number of spawned ctags processes (one for each indexer thread).
historyParallelism int number of threads to create in the first phase of indexing when history cache is generated (i.e. if -H indexer option is used). These threads usually spawn a SCM command such as git log.
historyFileParallelism int number of threads to create when storing history of individual files. These are created next to the general history handling threads; so there can be maximum historyParallelism + historyFileParallelism threads in total during the first phase of indexing when history cache is created (assumes the indexer is run with -H).
indexerAuthenticationToken string API token to use when performing API calls to the web application (e.g. when uploading the new configuration at end of the indexing when running with the -U command line option). Alternatively one can use the --token command line option. Complements the authenticationTokens configuration option for the web app side.
mergeCommitsEnabled boolean if set to true it will add merge commits to history. This may lead to significantly higher demand for JVM memory used during indexing. This is sometimes a problem with repositories that have large/rich history (e.g. the linux-mainline Git repository with merge commit requires more than 16 GiB of heap). Can be set on per project level. Default is false.
handleHistoryOfRenamedFiles boolean if set to true full history of renamed files will be processed. This may significantly increase indexing time and storage required. Can be set on per project level (where the tunable has different name - handleRenamedFiles). Default is false.
scanningDepth int maximum depth of directory traversal when scanning for repositories. The --depth indexer option modifies this prioperty, only for runtime. Default is 2. E.g. if the /mercurial/ directory (relative to source root) is a repository and /mercurial/usr/closed/ is its sub-repository, the latter will be discovered only if the depth is set to 2 or greater.
historyChunkCount int maximum count of changesets to process in one go when creating history cache. This setting overrides per repository values. So far only Mercurial and Git can generate the history cache in chunks, their values being 128k and 64k, respectively. Increasing this value may lead to better indexing times at the cost of higher heap size requirements and vice versa.
historyCachePerPartesEnabled boolean when set to false, per partes history cache creation is disabled, i.e. history cache is created in one go. This is handy for those who have repositories with sizeable history (either in terms of number of changesets or changed files or both; thinking hundreds thousands of changesets) and can throw lots of memory on the JVM (think tens of gigabytes) for the sake of speeding the indexing. This tunable is usable especially when reindexing from scratch. default is true.
xrefTimeout long maximum timeout for generating xref for single file in seconds. If the xref is not generated within this time, the work is canceled. The default value is 30.
connectTimeout int connect timeout in seconds for API calls done by the indexer. Default is 3 seconds.
apiTimeout int API timeout in seconds for asynchronous API calls done by the indexer. This is the overall time spent waiting for the task associated with the API call to complete. Default is 300 seconds (5 minutes). The individual API calls to check the state that happen every second are governed by the connectTimeout so in reality the overall time can be connectTimeout * apiTimeout in the worst case.
generateHtml boolean Economy mode, i.e. whether to generate "xref" (cross-reference files) on disk. Default is true.
historyBasedReindex boolean If enabled, it will use SCM to gather list of files to reindex. This should be generally faster than traversing the directory structure. The initial indexing will use the directory traversal even if this is enabled. Requires projects to be enabled. Works for Mercurial and Git. If a project contains any other repository types, the indexer will fall back to the directory traversal method. History and history cache has to be enabled for this to work. Default is true.

Use the types from the table e.g. as follows:

<!-- Sample for setCtagsTimeout. Default is 10 -->
  <void property="ctagsTimeout">
   <long>11</long>
  </void>

Other Indexer configuration

Java system properties

The indexer has to be able to run the Source Code Management commands. If they are located in non standard directories, the Java system properties have to be used.

bk: -Dorg.opengrok.indexer.history.BitKeeper
hg: -Dorg.opengrok.indexer.history.Mercurial
cvs: -Dorg.opengrok.indexer.history.cvs
svn: -Dorg.opengrok.indexer.history.Subversion
sccs: -Dorg.opengrok.indexer.history.SCCS
cleartool: -Dorg.opengrok.indexer.history.ClearCase
git: -Dorg.opengrok.indexer.history.git
p4: -Dorg.opengrok.indexer.history.Perforce
mtn: -Dorg.opengrok.indexer.history.Monotone
blame: -Dorg.opengrok.indexer.history.RCS
bzr: -Dorg.opengrok.indexer.history.Bazaar

Note: these are Java options, not indexer options, so they have to be used like so:

java -Dorg.opengrok.indexer.history.Mercurial=/my/very/alternative/path/to/hg \
    -jar opengrok.jar -H -S -P -s /opengrok/src -d /opengrok/data ...

Custom ctags configuration

To make ctags recognize additional symbols/definitions/etc. it is possible to specify configuration file with extra configuration options for ctags.

This can be done by using the -o Indexer option.

Sample configuration file for Solaris code base:

--regex-asm=/^[ \t]*(ENTRY_NP|ENTRY|RTENTRY)+\(([a-zA-Z0-9_]+)\)/\2/f,function/
--regex-asm=/^[ \t]*ENTRY2\(([a-zA-Z0-9_]+),[ ]*([a-zA-Z0-9_]+)\)/\1/f,function/
--regex-asm=/^[ \t]*ENTRY2\(([a-zA-Z0-9_]+),[ ]*([a-zA-Z0-9_]+)\)/\2/f,function/
--regex-asm=/^[ \t]*ENTRY_NP2\(([a-zA-Z0-9_]+),[ ]*([a-zA-Z0-9_]+)\)/\1/f,function/
--regex-asm=/^[ \t]*ENTRY_NP2\(([a-zA-Z0-9_]+),[ ]*([a-zA-Z0-9_]+)\)/\2/f,function/

Introduce own mapping for an extension to analyzer

Use the -A Indexer option, e.g. to make files with the .cs suffix to be processed as plain text:

-A .cs:org.opengrok.indexer.analysis.plain.PlainAnalyzerFactory

This will map extension .cs to the analyzer created by the PlainAnalyzerFactory . You should even be able to override OpenGroks analyzers using this option.

OpenGrok also allows using just the prefix. E.g. the following are all equivalent:

-A .e:org.opengrok.indexer.analysis.c.CAnalyzerFactory
-A .e:CAnalyzerFactory
-A .e:CAnalyzer
-A .e:C

To clear the mapping:

-A .e:-

so that the plain-text heuristic is active as a fallback for .e files. Or you could explicitly map the PlainAnalyzerFactory:

-A .e:Plain

(N.b. the case-sensitivity of the class name.)