change remote history tunable semantics to fuzzy logic #748

vladak · 2014-02-13T12:41:48Z

When thinking about #747 it occurred to me that the linear history generation for repositories such as CVS could be sped up. Currently, the main history index for such repositories is NOP because the repos do not support generating history for directories. The history generation is done in the Lucene index phase per each file via addFile() (as described in #747) in linear fashion - file after file.

This could be changed so that the history for each file in the repo will be generated in the history index phase in parallel (so it will convert it from NOP to proper history index) and then in the Lucene index phase the queries done in populateDocument() will read the history for given file from cache (be it file-based or JDBC).

The text was updated successfully, but these errors were encountered:

vladak · 2014-02-13T12:46:03Z

This should help a lot for setups when generating history for CVS repos such as NetBSD/OpenBSD, especially from behind a proxy. Often it happens that most of the projects are indexed and then the indexer goes linearly through files in *BSD repo and the whole indexing has to wait for it.

vladak · 2014-02-20T13:14:04Z

the entry for this in Repository.java:createCache():

346        // If we don't have a directory parser, we can't create the cache
347        // this way. Just give up and return.
348        if (!hasHistoryForDirectories()) {
349            Logger.getLogger(getClass().getName()).log(
350                Level.INFO,
351                "Skipping creation of history cache for {0}, since retrieval " +
352                "of history for directories is not implemented for this " +
353                "repository type.", getDirectoryName());
354            return;
355        }

IndexDatabase.java:indexDown() could be reused for the recursive directory traversal, submitting getHistory() jobs to thread pool along the way so that each file has its history generated.

However, for JDBC this would probably fail because it is not able to store the history of the files as JDBCHistoryCache.java says :

156    /**
157     * Check whether this cache implementation can store history for the given
158     * repository. Only repositories that support retrieval of history for the
159     * whole directory at once are supported.
160     */
161    @Override
162    public boolean supportsRepository(Repository repository) {
163        return repository.hasHistoryForDirectories();
164    }

and this is used in HistoryGuru.java:getHistory() like this:

208            if (useCache() && historyCache.supportsRepository(repos)) {
209                history = historyCache.get(file, repos, withFiles);
210            } else {
211                history = repos.getHistory(file);
212            }

The repos.getHistory(file) in the else branch just creates new executor and calls cvs log for the file. Same thing happens in the UI when History view is requested for the file (when JDBC is in use).

This means another cache (file-based) would have to be used for storing the history in the history index phase and then having a fall-back in the xref phase.

vladak · 2014-02-20T13:39:29Z

The way how to avoid expensive index generation (at the expense of losing ability to search history for given repo) would be to add fuzzy logic to the OPENGROK_REMOTE_REPOS_OFF tunable in the OpenGrok shell script and modify HistoryGuru.java:getHistory() to only perform history lookup if called from the UI.

…y from UI fixes oracle#748

vladak added this to the 0.13 milestone Feb 13, 2014

vladak added the enhancement label Feb 13, 2014

vladak self-assigned this Feb 13, 2014

vladak added a commit to vladak/OpenGrok that referenced this issue Feb 21, 2014

change tunable of remote repositories to allow generating history onl…

425278c

…y from UI fixes oracle#748

vladak mentioned this issue Feb 21, 2014

change tunable of remote repositories to allow generating history only f... #757

Merged

vladak closed this as completed in #757 Feb 26, 2014

vladak mentioned this issue Mar 11, 2014

provide a history indexer blacklist option (Bugzilla #15659) #446

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change remote history tunable semantics to fuzzy logic #748

change remote history tunable semantics to fuzzy logic #748

vladak commented Feb 13, 2014

vladak commented Feb 13, 2014

vladak commented Feb 20, 2014

vladak commented Feb 20, 2014

change remote history tunable semantics to fuzzy logic #748

change remote history tunable semantics to fuzzy logic #748

Comments

vladak commented Feb 13, 2014

vladak commented Feb 13, 2014

vladak commented Feb 20, 2014

vladak commented Feb 20, 2014