Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s1kd-brexcheck ERROR: Invalid object path in BREX #5

Open
MihailCosmin opened this issue Jun 24, 2020 · 28 comments
Open

s1kd-brexcheck ERROR: Invalid object path in BREX #5

MihailCosmin opened this issue Jun 24, 2020 · 28 comments
Assignees

Comments

@MihailCosmin
Copy link

MihailCosmin commented Jun 24, 2020

Hi,

I have 2 huge BREX files, more than 8000 lines of code combined.
And the s1kd-brexcheck fails with ERROR: Invalid object path in BREX
Step by step I have commented the rules that were responsible for these errors. And found that there were only 4 rules that were causing such error.

Is there any option to run s1kd-brexcheck and ignore such errors?
More exactly, just skip the BREX rules that are "broken" (although they are not really broken, maybe xmlXPathEvalExpression is not 100% correct).

The rules that do not work:

<structureObjectRule id="RULE-153"> -->
	<objectPath allowedObjectFlag="1">/dmodule/pdf:style</objectPath>
	<objectUse>Custom PDF styling rules. Not S1000D compliant.</objectUse>
</structureObjectRule>
<structureObjectRule id="RULE-321">
      <objectPath allowedObjectFlag="0">(//dmStatus/applic/displayText/simplePara[lower-case(.)[contains(.,'FFF')]])</objectPath>
</structureObjectRule>
<structureObjectRule id="RULE-555">
   <objectPath allowedObjectFlag="1">//dmCode[attribute::infoCode != "003" and matches(@assyCode, '.{2}')]</objectPath>
</structureObjectRule>
<structureObjectRule id="RULE-677">
    <objectPath allowedObjectFlag="1">/dmodule/identAndStatusSection/pmStatus/reasonForUpdate[not(@id= /dmodule/content//@reasonForUpdateRefIds/tokenize(normalize-space(), '\t'))]</objectPath>
</structureObjectRule>
@kibook
Copy link
Owner

kibook commented Jun 24, 2020

Part of the issue is that libxml2 only supports XPath 1.0, so these functions from XPath 2.0 won't work:

  • lower-case
  • matches
  • tokenize

I believe the issue with the first example is the use of a namespace. I think I need to register all namespaces declared in ancestor nodes to the xmlXPathContext used to resolve the XPath expression. Does your BREX DM have a declaration like this somewhere?

xmlns:pdf="http://someurl.com"

As long as it's on the <structureObjectRule> or any ancestor node (like the root <dmodule> node), that should be valid XPath 1.0.

There's no option currently to skip invalid rules, but I will definitely change it so that invalid rules are skipped (with a WARNING) instead of causing the whole check to be aborted. Thanks for the suggestion, @MihailCosmin !

@kibook kibook self-assigned this Jun 24, 2020
@kibook
Copy link
Owner

kibook commented Jun 24, 2020

Rules with "invalid" XPath will be skipped as of 37583f2

Now I just need to fix namespaces so that XPath using them aren't also considered invalid.

@kibook
Copy link
Owner

kibook commented Jun 24, 2020

44ffdf5 should fix the use of XPath prefixes, as long as they have been properly declared in the BREX XML.

I mentioned declaring them on the <structureObjectRule> element before, but you can also declare them on the <objectPath> itself, or any ancestor node:

<structureObjectRule id="RULE-153">
    <objectPath xmlns:pdf="http://example.com" allowedObjectFlag="1">/dmodule/pdf:style</objectPath>
    <objectUse>Custom PDF styling rules. Not S1000D compliant.</objectUse>
</structureObjectRule>

Thanks again for pointing these out, the XPath namespace prefixes in particular is something I hadn't really considered.

@kibook kibook removed their assignment Jun 25, 2020
@kibook
Copy link
Owner

kibook commented Jun 25, 2020

Since S1000D (Issue 4.0+) does recommend XPath 2.0 in the specification, I'm looking in to extending libxml2's XPath support.

I'm not sure it can be made fully compliant to XPath 2.0 without modifying libxml2 itself, since XPath 2.0 isn't completely backwards compatible with XPath 1.0 (https://www.w3.org/TR/xpath20/#id-backwards-compatibility), but it's easy to add new functions, like matches. That may be "good enough".

I created a new project for "libxpath2": https://github.com/kibook/libxpath2. The idea is that once I get this somewhat stable, s1kd-brexcheck can use it to support XPath 2.0-like functions at the very least.

@MihailCosmin
Copy link
Author

Hi, That would be great. Because now I see there were many more BREX rules that used Xpath 2.0. The reason they were not picked up before, was because I was checking only one DM. As soon as I started checking all DMs, more BREX rules were found to be wrong.

I see that Saxon offers support for Xpath 2.0

Saxon-HE (Home Edition) is an open source product available under the Mozilla Public License version 2.0. It provides implementations of XSLT (3.0), XQuery (3.1), and XPath (2.0 and 3.1) at the basic level of conformance defined by W3C, plus from Saxon 10 the optional features higher-order functions and dynamic evaluation. It is available for Java and .NET.

They have also some files for C: https://www.saxonica.com/download/c.xml

Not sure if you can use these, as I am a complete zero when talking about C.

@kibook
Copy link
Owner

kibook commented Jun 26, 2020

Do you know if you use XPath 2.0-specific features besides functions, such as the if or for keywords? I'm not sure if those can be implemented as easily as adding extra functions.

Would you be able to provide a list of the XPath 2.0 functions used in your BREX? Eventually, I would like to implement all XPath 2.0 functions, but that's going to take some time. If you can give me a list of the ones you use, I could focus on those first.

I'll also look at Saxon to see if it would be possible to have an option to use its XPath engine to validate rules instead of libxml2 if the user chooses, in order to get complete XPath 2.0 support.

@kibook kibook self-assigned this Jun 26, 2020
@MihailCosmin
Copy link
Author

Do you know if you use XPath 2.0-specific features besides functions, such as the if or for keywords?

No cases where "if" or "for" are used.
In fact all other rules that were wrong used only the three functions from below.

lower-case
matches
tokenize

But full support for Xpath 2.0 could still be needed.

@kibook
Copy link
Owner

kibook commented Jun 29, 2020

Thanks for the list! However, after looking more into XPath 2.0, I don't think just adding XPath 2.0-like functions will be enough. For example, the way the tokenize function is used in your example is specific to XPath 2.0 syntax.

I'll probably focus more on getting the Saxon XPath engine working within s1kd-brexcheck.

@kibook
Copy link
Owner

kibook commented Jul 16, 2020

I have added an experimental option to use Saxon to evaluate the objectPath instead of libxml2. In my tests, it worked successfully on all the example XPath 2.0-based rules you listed.

At least for now, it is a compile-time option, so that Saxon is not a requirement to build and run s1kd-brexcheck. To enable it:

  1. Download and install Saxon/C from http://www.saxonica.com/saxon-c/index.xml

  2. Build with:

    $ make BREXCHECK_XPATH_ENGINE=saxon
    

I've only tested it on Linux so far, so the Makefile might still need some adjustments to work out-of-the-box on Windows with MinGW/Cygwin.

I have included a slightly modified version of the Saxon/C API along with the s1kd-brexcheck source that addresses a bug in the latest released version (1.2.1): https://saxonica.plan.io/issues/4513. The XPathProcessor::declareNamespace function for declaring namespaces for use in XPath expressions (like pdf:style) was broken.

There are still some major issues left to work out, like what appear to be big memory leaks within the Saxon/C library itself, though I could also just be misunderstanding how to properly free everything. Either way, it doesn't seem suitable to use on large sets of DMs in one call right now, but a separate call for each DM works okay, though it is significantly slower.

@MihailCosmin
Copy link
Author

Do I need to add an argument to the brexcheck command to do the check using saxon?
I'm trying on Linux and I'm getting these errors:

xmlXPathCompOpEval: function **lower-case** not found
XPath error : Unregistered function
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...

xmlXPathCompOpEval: function **matches** not found
XPath error : Unregistered function
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...

XPath error : Invalid expression
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...

@kibook
Copy link
Owner

kibook commented Jul 17, 2020

Did you build with:

$ make BREXCHECK_XPATH_ENGINE=saxon

From your errors, it is still using libxml2's XPath engine. You can check which engine it uses with:

$ s1kd-brexcheck --version

There is a new XPath engine: line.

I'm still working out what is needed for a binary package that would have Saxon enabled, so you will have to build from source to test.

@MihailCosmin
Copy link
Author

Yup, that did it.

Now it seems to work. I also made an error in a DM according to a BREX rule that used matches, and it was detected.

Just for info. I tried to build on Windows and I got the below errors:

$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -O3 -c -Isaxon/Saxon.C.API -o saxon/saxon.o `pkg-config --cflags libxml-2.0` saxon/saxon.cpp
cc -O3 -c -o saxon/SaxonCGlue.o saxon/Saxon.C.API/SaxonCGlue.c
cc -O3 -c -o saxon/SaxonCXPath.o saxon/Saxon.C.API/SaxonCXPath.c
g++ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
saxon/Saxon.C.API/SaxonProcessor.cpp: In member function ‘void SaxonProcessor::setResourcesDirectory(const char*)’:
saxon/Saxon.C.API/SaxonProcessor.cpp:516:2: error: ‘strncat_s’ was not declared in this scope; did you mean ‘strncat’?
  516 |  strncat_s(_getResourceDirectory(), destSize,dir, strlen(dir));
      |  ^~~~~~~~~
      |  strncat
make: *** [Makefile:55: saxon/SaxonProcessor.o] Error 1

@kibook
Copy link
Owner

kibook commented Jul 17, 2020

Thanks for the info! I got the same error too when I gave it a quick try. The basic issue seems to be that Cygwin/MinGW are counted as Windows in the Saxon/C code's checks, but their environments are set up more like Linux. strncat_s is a Microsoft extension that is obviously not included in Cygwin or MinGW's standard C libraries.

Saxon/C also hardcodes the location of the shared library (libsaxonhec.dll) and Java runtime (rt) it uses, which will be an issue moving the binaries between environments (Cygwin/MinGW to normal Windows).

@kibook
Copy link
Owner

kibook commented Jul 17, 2020

I updated the Makefile in a236fa1 to treat Cygwin and MinGW as Linux when compiling the Saxon/C API, except for SaxonCGlue.c, which contains the hardcoded path to the shared library. I was able to build it successfully in my MinGW environment after that.

@MihailCosmin
Copy link
Author

I'm still getting the same error as before on cygwin.

@kibook
Copy link
Owner

kibook commented Jul 20, 2020

In the Makefile output, do you see -D__linux__ in the gcc and g++ commands (except for the one compiling SaxonCGlue.c)? That is what should cause it to use the "linux" portions of the Saxon/C code, like plain strncat instead of strncat_s.

Although I was able to build s1kd-brexcheck, I haven't been able to run it yet despite tweaking the hardcoded shared library path in various ways. I am going to have to keep experimenting.

@MihailCosmin
Copy link
Author

No,

This is the output, same as before:

$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -O3 -c -Isaxon/Saxon.C.API -o saxon/saxon.o `pkg-config --cflags libxml-2.0` saxon/saxon.cpp
cc -O3 -c -o saxon/SaxonCGlue.o saxon/Saxon.C.API/SaxonCGlue.c
cc -O3 -c -o saxon/SaxonCXPath.o saxon/Saxon.C.API/SaxonCXPath.c
g++ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
saxon/Saxon.C.API/SaxonProcessor.cpp: In member function ‘void SaxonProcessor::setResourcesDirectory(const char*)’:
saxon/Saxon.C.API/SaxonProcessor.cpp:516:2: error: ‘strncat_s’ was not declared in this scope; did you mean ‘strncat’?
  516 |  strncat_s(_getResourceDirectory(), destSize,dir, strlen(dir));
      |  ^~~~~~~~~
      |  strncat
make: *** [Makefile:69: saxon/SaxonProcessor.o] Error 1

@kibook
Copy link
Owner

kibook commented Jul 20, 2020

Try exporting the OSTYPE environment variable to make sure it is visible to make:

$ export OSTYPE
$ make BREXCHECK_XPATH_ENGINE=saxon

If that works, you can add export OSTYPE to ~/.profile so that it is always exported.

@MihailCosmin
Copy link
Author

That possibly worked.
The .exe was created. But I cannot use it in cmd.

On one system I just get an error message as a pop-up : "The application was unable to start correctly (0xc000007b)."
And one another system I get a message from cmd:

Unable to load C:\Program Files\Saxonica\SaxonHEC1.2.1\libsaxonhec.dll
Error: : Invalid or incomplete multibyte or wide character

But libsaxonhec.dll is exactly at that location.

If needed, this was the output from make:

$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -D__linux__ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp

g++ -D__linux__ -O3 -c -o saxon/SchemaValidator.o saxon/Saxon.C.API/SchemaValidator.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmAtomicValue.o saxon/Saxon.C.API/XdmAtomicValue.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmItem.o saxon/Saxon.C.API/XdmItem.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmNode.o saxon/Saxon.C.API/XdmNode.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmValue.o saxon/Saxon.C.API/XdmValue.cpp
g++ -D__linux__ -O3 -c -o saxon/XPathProcessor.o saxon/Saxon.C.API/XPathProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/XQueryProcessor.o saxon/Saxon.C.API/XQueryProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/Xslt30Processor.o saxon/Saxon.C.API/Xslt30Processor.cpp
g++ -D__linux__ -O3 -c -o saxon/XsltProcessor.o saxon/Saxon.C.API/XsltProcessor.cpp
>brex.h; for f in brex/DMC-AE-A-04-10-0301-00A-022A-D_003-00.XML brex/DMC-S1000D-A-04-10-0301-00A-022A-D_004-00_EN-US.XML brex/DMC-S1000D-E-04-10-0301-00A-022A-D_009-00_EN-US.XML brex/DMC-S1000D-F-04-10-0301-00A-022A-D_001-00_EN-US.XML brex/DMC-S1000D-G-04-10-0301-00A-022A-D_001-00_EN-US.XML stats.xsl; do xxd -i "$f" >>brex.h; done
cc -Wall -Werror -pedantic-errors -I ../common `pkg-config --cflags libxml-2.0 libxslt libexslt` -O3 -DUSE_SAXON -o s1kd-brexcheck s1kd-brexcheck.c ../common/s1kd_tools.c saxon/saxon.o saxon/SaxonCGlue.o saxon/SaxonCXPath.o saxon/SaxonProcessor.o saxon/SchemaValidator.o saxon/XdmAtomicValue.o saxon/XdmItem.o saxon/XdmNode.o saxon/XdmValue.o saxon/XPathProcessor.o saxon/XQueryProcessor.o saxon/Xslt30Processor.o saxon/XsltProcessor.o `pkg-config --libs libxml-2.0 libxslt libexslt` -lstdc++ -ldl

@kibook
Copy link
Owner

kibook commented Jul 20, 2020

I get the same errors. I'm going to keep looking in to them.

@kibook
Copy link
Owner

kibook commented Jul 29, 2020

I tried a different approach by modifying the Saxon/C API to link with the libsaxonhec library at compile-time, instead of dynamically loading it at run-time from a hard-coded path (3b80c66). I am now able to successfully run it on Windows.

The pre-release 3.2.5 contains an extra s1kd-brexcheck_4.4.4_win64-saxon.zip package, which is a standalone distribution of s1kd-brexcheck with Saxon enabled.

@kibook
Copy link
Owner

kibook commented Jul 29, 2020

I forgot to include the JET runtime files in the above package. Use s1kd-brexcheck_4.4.5_win64-saxon.zip instead.

@kibook
Copy link
Owner

kibook commented Sep 4, 2020

Some updates:

I'm still investigating the memory leak issues with Saxon/C. I reported what I found on their development tracker, and they are looking into it as well: https://saxonica.plan.io/boards/4/topics/7982.

I found another open source XPath 2.0 implementation, XQilla. It seems to be faster than Saxon and with a smaller footprint, plus no memory leaks. Unfortunately, I haven't had much luck finding or building Windows binaries for it, so I've only gotten it working on Linux thus far.

BREXCHECK_XPATH_ENGINE can now have the following values:

  • LIBXML (default)
  • SAXON
  • XQILLA

@kibook kibook closed this as completed Sep 4, 2020
@kibook kibook reopened this Sep 4, 2020
@kibook
Copy link
Owner

kibook commented Sep 4, 2020

I finally did manage to build XQilla on Windows with Cygwin, and I created a standalone Windows package for testing: s1kd-brexcheck_4.8.0_win64-xqilla.zip.

@MihailCosmin
Copy link
Author

I finally did manage to build XQilla on Windows with Cygwin, and I created a standalone Windows package for testing: s1kd-brexcheck_4.8.0_win64-xqilla.zip.

Hi, the new s1kd-brexcheck.exe doesn't give any error messages, so it appears to work...
But it also doesn't detect any mistakes in the datamodules.

For example I made some clear mistakes regarding the securityClassification, systemDiffCode and modelIdentCode. And none are detected with the new version of s1kd-brexcheck.exe.
With the old version they are detected.

@kibook
Copy link
Owner

kibook commented Sep 7, 2020

Good catch! This was an issue with how Xerces-C implements attribute nodes. DOMNode::getParentNode() doesn't return the element they are on, DOMAttr::getOwnerElement() must be used instead. This should be fixed in 022c40a, s1kd-brexcheck_4.8.2_win64-xqilla.zip.

@MihailCosmin
Copy link
Author

Yes, I confirm now it works and it detects the errors.

@kibook
Copy link
Owner

kibook commented Sep 11, 2020

Great! Thank you for testing it.

530216e is an overhaul on the XPath 2.0 functionality, so that it's not an either-or choice, but is based on what version of XPath was referenced by the corresponding issue of S1000D for each BREX. If the BREX is issue 3.0 and lower, it will use XPath 1.0 (with libxml), and if it is issue 4.0 and up, it will use XPath 2.0 (with either Saxon or XQilla). You can override this with --xpath-version to force it to use either version regardless of the issue of the BREX.

I changed the name of the Makefile variable to XPATH2_ENGINE, with options NONE, SAXON or XQILLA to reflect that you aren't replacing libxml anymore. The default will still be NONE while I keep testing/improving the interfaces for the other options.

Here are the latest Windows builds:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants