Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change remote history tunable semantics to fuzzy logic #748

Closed
vladak opened this issue Feb 13, 2014 · 3 comments · Fixed by #757
Closed

change remote history tunable semantics to fuzzy logic #748

vladak opened this issue Feb 13, 2014 · 3 comments · Fixed by #757
Assignees
Milestone

Comments

@vladak
Copy link
Member

vladak commented Feb 13, 2014

When thinking about #747 it occurred to me that the linear history generation for repositories such as CVS could be sped up. Currently, the main history index for such repositories is NOP because the repos do not support generating history for directories. The history generation is done in the Lucene index phase per each file via addFile() (as described in #747) in linear fashion - file after file.

This could be changed so that the history for each file in the repo will be generated in the history index phase in parallel (so it will convert it from NOP to proper history index) and then in the Lucene index phase the queries done in populateDocument() will read the history for given file from cache (be it file-based or JDBC).

@vladak vladak added this to the 0.13 milestone Feb 13, 2014
@vladak vladak self-assigned this Feb 13, 2014
@vladak
Copy link
Member Author

vladak commented Feb 13, 2014

This should help a lot for setups when generating history for CVS repos such as NetBSD/OpenBSD, especially from behind a proxy. Often it happens that most of the projects are indexed and then the indexer goes linearly through files in *BSD repo and the whole indexing has to wait for it.

@vladak
Copy link
Member Author

vladak commented Feb 20, 2014

the entry for this in Repository.java:createCache():

346        // If we don't have a directory parser, we can't create the cache
347        // this way. Just give up and return.
348        if (!hasHistoryForDirectories()) {
349            Logger.getLogger(getClass().getName()).log(
350                Level.INFO,
351                "Skipping creation of history cache for {0}, since retrieval " +
352                "of history for directories is not implemented for this " +
353                "repository type.", getDirectoryName());
354            return;
355        }

IndexDatabase.java:indexDown() could be reused for the recursive directory traversal, submitting getHistory() jobs to thread pool along the way so that each file has its history generated.

However, for JDBC this would probably fail because it is not able to store the history of the files as JDBCHistoryCache.java says :

156    /**
157     * Check whether this cache implementation can store history for the given
158     * repository. Only repositories that support retrieval of history for the
159     * whole directory at once are supported.
160     */
161    @Override
162    public boolean supportsRepository(Repository repository) {
163        return repository.hasHistoryForDirectories();
164    }

and this is used in HistoryGuru.java:getHistory() like this:

208            if (useCache() && historyCache.supportsRepository(repos)) {
209                history = historyCache.get(file, repos, withFiles);
210            } else {
211                history = repos.getHistory(file);
212            }

The repos.getHistory(file) in the else branch just creates new executor and calls cvs log for the file. Same thing happens in the UI when History view is requested for the file (when JDBC is in use).

This means another cache (file-based) would have to be used for storing the history in the history index phase and then having a fall-back in the xref phase.

@vladak
Copy link
Member Author

vladak commented Feb 20, 2014

The way how to avoid expensive index generation (at the expense of losing ability to search history for given repo) would be to add fuzzy logic to the OPENGROK_REMOTE_REPOS_OFF tunable in the OpenGrok shell script and modify HistoryGuru.java:getHistory() to only perform history lookup if called from the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant