Limiting history indexing to select repositories only #4668
Replies: 4 comments 1 reply
-
When creating history cache for Git repositories (or any repository type which supports retrieving history per directory) the history for the top level directory of given repository is taken first and then inverted in memory to arrive to per file history. This is done for all repositories for given project. This happens during the first stage of indexing and avoids the costly per file history retrieval in the second phase, if successful. Which version do you run ? Since 1.12.21 the failure to generate history cache during the first phase of indexing for such repositories will cause the respective project indexing to fail (before entering the 2nd phase), unless overridden by the As for controlling history cache per project/repository I will have to check. |
Beta Was this translation helpful? Give feedback.
-
This is 1.7.30. Could the failure to generate index for the top-level directory be related to submodules? The way how the GIT tree is structured is that there's a top-level local GIT repository that has no real files in it, and then, in various subdirectories of that repository, the actual GIT trees are plugged as submodules. I do see that all these submodules have been discovered and recorded by OpenGrok as separate GIT repositories in configuration.xml. And there's this error in the beginning:
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
As for the per project override, the following read-only configuration overrides the default enabled history: <?xml version="1.0" encoding="UTF-8"?>
<java version="11.0.4" class="java.beans.XMLDecoder">
<object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
<void property="projects">
<void method="put">
<string>foo</string>
<object class="org.opengrok.indexer.configuration.Project">
<void property="handleRenamedFiles">
<boolean>false</boolean>
</void>
<void property="historyEnabled">
<boolean>false</boolean>
</void>
<void property="name">
<string>foo</string>
</void>
<void property="path">
<string>/foo</string>
</void>
<void property="mergeCommitsEnabled">
<boolean>false</boolean>
</void>
</object>
</void>
</void>
</object>
</java> When the indexer is run e.g. like this:
the history cache creation will be skipped for project This works for the initial indexing, there might be some caveats when using the written configuration as a read-only for the subsequent indexing. |
Beta Was this translation helpful? Give feedback.
-
Hi,
We've recently added a few GIT directories to the already checked out and indexed tree (that used to have only CVS before), and since then producing an index of history of changes went downhill. What used to take an hour is now running for days.
It seems that for every source file OpenGrok is using JGIT to get the history of changes, and that's taking 3-5s per file. I can see in lsof/strace how java is reading huge GIT pack files all the time, I can see in jstack how it's decompressing them and then walking the revision tree (a typical stack is below).
Not passing -H to the indexer helps things tremendously, but we lose the index of CVS history as well, and that used to work just fine before.
Is there a way to prevent OpenGrok from indexing history of all GIT repositories or of select repositories (specified by path) while still preserving index of CVS repositories in one big combined tree?
We've tried a few things, but short of not passing -H, nothing seemed to work. E.g. we've tried using --repository command line option to explicitly list only CVS repositories, or setting historyEnabled to false for GIT repositories in configuration.xml, but OpenGrok still tries indexing history for files from the GIT section of the checked out tree.
Beta Was this translation helpful? Give feedback.
All reactions