Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IgnoredNames does not work properly with project centric processing workflow #2602

Open
tulinkry opened this issue Dec 28, 2018 · 14 comments
Open
Labels

Comments

@tulinkry
Copy link
Contributor

I reported this in slack channel, in 1.1-rc41 (haven't tested others) the .git files were hidden in xref browsing. In 1.1 they're not. I don't know if this is a problem in my setup or if any logic in indexer is broken. Can someone else verify?

1.1-rc41:
screenshot 2018-12-28 at 09 00 43
1.1:
screenshot 2018-12-28 at 09 00 48

@vladak
Copy link
Member

vladak commented Dec 29, 2018

That looks as if the ignored list for Git no longer works as a whole - next to .git there is also the .gitignore and Gitrepository.java has this in its constructor:

98          ignoredDirs.add(".git");
99          ignoredFiles.add(".gitignore");

Does this happen for other SCMs ?

@vladak vladak added the bug label Dec 29, 2018
@vladak
Copy link
Member

vladak commented Dec 29, 2018

Will track this as a bug for now.

@Shooter3k
Copy link

fwiw, this bug is not happening to me using 1.1

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 5, 2019

So it is a bad setup I think. Let me experiment.

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 6, 2019

In my case it disapears when I try to upload groups with following command. Before the configuration has the "ignoreNames" section and it disappears when this is uploaded.

#
# The custom settings are now generated in the READ_ONLY_XML - let us apply it
#
opengrok-projadm \
        --base /var/opengrok \
        --jar ${JAR} \
        --roconfig ${READ_ONLY_XML} \
        --configmerge `which opengrok-config-merge` \
        --uri http://localhost:8080 \
        --refresh \
        --upload

Especially this command does not add the IgnoredNames property to the final configuration:

2019-01-06 13:32:41,240    DEBUG opengrok_tools | Command ['/Users/ktulinger/OpenGrok/opengrok-tools/env/bin/opengrok-config-merge', '-l', '10', '-a', 'distribution/target/opengrok/lib/opengrok.jar', '/var/opengrok/etc/groups.xml', '/var/folders/w4/fr4pd7zn0x1f9lwc8hsmhbwh0000gp/T/tmpocneq8p0'] took 1 seconds

This is the webapp configuration:

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.8.0_144" class="java.beans.XMLDecoder">
 <object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
  <void property="cmds">
   <object class="java.util.Collections" method="unmodifiableMap">
    <object class="java.util.HashMap">
     <void method="put">
      <string>org.opengrok.indexer.history.SubversionRepository</string>
      <string>/usr/bin/svn</string>
     </void>
     <void method="put">
      <string>org.opengrok.indexer.history.GitRepository</string>
      <string>/usr/bin/git</string>
     </void>
    </object>
   </object>
  </void>
  <void property="ctags">
   <string>/usr/local/bin/ctags</string>
  </void>
  <void property="dataRoot">
   <string>/private/var/opengrok/data</string>
  </void>
  <void id="IgnoredNames0" property="ignoredNames">
   <void id="IgnoredDirs0" property="ignoredDirs">
    <void property="items">
     <void method="add">
      <string>.bk</string>
     </void>
     <void method="add">
      <string>.hg</string>
     </void>
     <void method="add">
      <string>.bzr</string>
     </void>
     <void method="add">
      <string>.git</string>
     </void>
     <void method="add">
      <string>.svn</string>
     </void>
     <void method="add">
      <string>SCCS</string>
     </void>
     <void method="add">
      <string>.razor</string>
     </void>
     <void method="add">
      <string>RCS</string>
     </void>
     <void method="add">
      <string>CVS</string>
     </void>
     <void method="add">
      <string>CVSROOT</string>
     </void>
     <void method="add">
      <string>.repo</string>
     </void>
    </void>
   </void>
   <void id="IgnoredFiles0" property="ignoredFiles">
    <void property="items">
     <void method="add">
      <string>.hgtags</string>
     </void>
     <void method="add">
      <string>.hgignore</string>
     </void>
     <void method="add">
      <string>.gitignore</string>
     </void>
     <void method="add">
      <string>.p4config</string>
     </void>
     <void method="add">
      <string>.cvsignore</string>
     </void>
    </void>
   </void>
  </void>
  <void property="projectsEnabled">
   <boolean>true</boolean>
  </void>
  <void property="sourceRoot">
   <string>/private/var/opengrok/src</string>
  </void>
 </object>
</java>

Groups configuration contains just the single group as you would guess from the next snippet.

Result:

<?xml version="1.0" encoding="UTF-8"?>
<java version="9" class="java.beans.XMLDecoder">
 <object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
  <void property="cmds">
   <object class="java.util.Collections" method="unmodifiableMap">
    <object class="java.util.HashMap">
     <void method="put">
      <string>org.opengrok.indexer.history.SubversionRepository</string>
      <string>/usr/bin/svn</string>
     </void>
     <void method="put">
      <string>org.opengrok.indexer.history.GitRepository</string>
      <string>/usr/bin/git</string>
     </void>
    </object>
   </object>
  </void>
  <void property="ctags">
   <string>/usr/local/bin/ctags</string>
  </void>
  <void property="dataRoot">
   <string>/private/var/opengrok/data</string>
  </void>
  <void property="groups">
   <void method="add">
    <object class="org.opengrok.indexer.configuration.Group">
     <void property="name">
      <string>group-1</string>
     </void>
     <void property="pattern">
      <string>group-1.*</string>
     </void>
    </object>
   </void>
  </void>
  <void property="projectsEnabled">
   <boolean>true</boolean>
  </void>
  <void property="sourceRoot">
   <string>/private/var/opengrok/src</string>
  </void>
 </object>
</java>

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 6, 2019

Isolated a test case:

    @Test
    public void test() throws Exception {
        Configuration cfgBase = new Configuration();
        cfgBase.addGroup(new Group("group-1", "group-1-*"));

        Configuration cfgNew = new Configuration();
        final RuntimeEnvironment env = RuntimeEnvironment.getInstance();
        env.setConfiguration(cfgNew);
        RepositoryFactory.initializeIgnoredNames(env);

        System.out.println(cfgBase.getXMLRepresentationAsString());
        System.out.println(cfgNew.getXMLRepresentationAsString());

        merge(cfgBase, cfgNew);

        System.out.println(cfgNew.getXMLRepresentationAsString());
        Assert.assertTrue("Should contain .git ignored dir", cfgNew.getIgnoredNames().getIgnoredDirs().getItems().contains(".git"));
    }

Looks like it is skipped because the groups.xml contains default ignored names.

@vladak
Copy link
Member

vladak commented Jan 7, 2019

I think this is a problem of the merge itself, perhaps the same as to what is described in #2147.

@tulinkry tulinkry changed the title IgnoredNames does not work properly in 1.1 IgnoredNames does not work properly with parallel reindex Jan 7, 2019
@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 7, 2019

Workaround is change the flow:

#
# Download the current webapp configuration to BASE_XML
#
opengrok-projadm \
	--base /var/opengrok \
	--java ${JAVA_HOME}/bin/java \
	--jar ./lib/opengrok.jar \
	--uri http://localhost:8080/source \
	--refresh

#
# The custom settings are now generated in the READ_ONLY_XML - let us apply it
#
TEMPFILE=`mktemp`
echo "Merging the configuration with read-only configuration"
run_configmerge ${BASE_XML} ${READ_ONLY_XML} > ${TEMPFILE}
mv -f ${TEMPFILE} ${BASE_XML}
echo "Applying the changes to webapp"
curl -X PUT --header "Content-Type: application/xml" --data "@${BASE_XML}" http://localhost:8080/source/api/v1/configuration

@Shooter3k
Copy link

I'm not sure if this is related or not but I've had tons of head scratching issues with the configuration file and finally landed on calling the indexer with -W and -R parameters so that it reads in the old parameters options and then writes the new one when it's done. This has resolved all of my issues with the configuration file.

Here is my full index command for which I use on 200+ GB of files/code.

Note: this is a WIP version as we're working on installing things in a more appropriate manor

/opt/rh/rh-python35/root/usr/bin/opengrok-indexer -C
-J=-Djava.util.logging.config.file=/network/drive/opengrok/grok/repo1/unixlogging.properties
-a /network/drive/opengrok/opengrok-1.1/lib/opengrok.jar --
-s /network/drive/opengrok/grok/repo1/source/
-d /network/drive/opengrok/grok/repo1/data
-P
-p /default1
-p /default2
-c /network/drive/opengrok/ctags/ctags
-H
-S
-G
--leadingWildCards on
-W /network/drive/opengrok/grok/repo1/etc/configuration_unix.xml
-R /network/drive/opengrok/grok/repo1/etc/configuration_unix.xml
-U http://server/source

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 7, 2019

Thank you. I can confirm when using full indexer for all projects (like your example), the problem disappears.

However, I wanted to set up a per project indexing (using opengrok-reindex-project python script, eventually running indexer only per specified directory) and that's what led to these issues (because the configuration is never written to a file at the end of indexing in this case).

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 9, 2019

In this exact case the problem is that IgnoredNames does not override equals nor hashCode.

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 9, 2019

Which bubbles down that when using merge, all properties of configuration should implement equals or otherwise from our perspective the results can be indeed surprising.

@vladak vladak changed the title IgnoredNames does not work properly with parallel reindex IgnoredNames does not work properly with project centric processing workflow Jan 17, 2019
@shusterboris
Copy link

Is the issue still open? In my case, I removed a couple of directories from IgnoredDirs and ran a reindex with opengrok-resync. But the directories I removed from ignore list are still missing from the index. Should I use another method of indexing? Volume of code quite big, so it's long process...

@vladak
Copy link
Member

vladak commented Mar 16, 2023

Is the issue still open? In my case, I removed a couple of directories from IgnoredDirs and ran a reindex with opengrok-resync. But the directories I removed from ignore list are still missing from the index. Should I use another method of indexing? Volume of code quite big, so it's long process...

Unless these directories are changed, they will not appear in already existing index. The reindex process is incremental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants