Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getentityname.py fails on DB4 files without entity declaration #556

Closed
fsundermeyer opened this issue May 28, 2020 · 11 comments
Closed

getentityname.py fails on DB4 files without entity declaration #556

fsundermeyer opened this issue May 28, 2020 · 11 comments
Assignees
Labels

Comments

@fsundermeyer
Copy link
Member

fsundermeyer commented May 28, 2020

cd /local/git/daps/test/documents/xml
> ../../../libexec/getentityname.py book.xml                                                                             
entity-decl.ent
> ../../../libexec/getentityname.py not_in_set.xml                                                                      
Traceback (most recent call last):
  File "../../../libexec/getentityname.py", line 342, in <module>
    sys.exit(main())
  File "../../../libexec/getentityname.py", line 327, in main
    ents = getentities(args)
  File "../../../libexec/getentityname.py", line 215, in getentities
    content = remove_xml_comments(match['IntSubset'])
  File "../../../libexec/getentityname.py", line 150, in remove_xml_comments
    if '<!--' not in content:
TypeError: argument of type 'NoneType' is not iterable
@tomschr
Copy link
Collaborator

tomschr commented May 28, 2020

@fsundermeyer Thanks for the report. Although the error report seems to be a bit incomplete (copy-and-type error?) I could decipher it successfully. 😉

I've looked at the path and checked all the files. Only one file did raise an exception, that was not_in_set.xml.

A first bug fix, I've created a PR at tomschr/getentities#4 in branch bugfix/3-not_in_set.xml. With this fix, the above file works now as expected (did not raise an exception).

Once you can confirm if this works, I can copy the script to the daps repository.

@ghost
Copy link

ghost commented May 29, 2020

Frank -- is this issue a duplicate of tomschr/getentities#3? If not, I think the description needs to be fixed.

Also, Toms, why is getentities in a different repo now?

@tomschr
Copy link
Collaborator

tomschr commented May 29, 2020

@sknorr Actually, I've created the mentioned bug in my repository yesterday. It was just a reminder, a reference, and I've included some information for me. Furthermore, it links to this issue.

Apart from that, this repo exists since last year after a discussion with Frank. I just wanted to have a separate place for having test cases and some data files. A separate repo allows me to focus entirely on the script, testing it in a Python CI environment etc. Having all the cruft in the daps repository would "pollute" daps, gives me too much noise and distraction, and be a bit too much for a simple Python script.

As this script doesn't change much, I consider copying this stable script to the daps repo as a good compromise. Maybe there is some Git magic (subrepos/subtrees/...?), but I haven't looked into these.

@fsundermeyer
Copy link
Member Author

fsundermeyer commented Mar 22, 2021

Sorry for the initial "short" report--was a cut & paste error. Amended it now.
Tested your fix from tomschr/getentities#4. It fixes the failure initially reported, but now fails for other XML files, e.g.

and

Traceback (most recent call last):
  File "./getentityname.py", line 354, in <module>
    sys.exit(main())
  File "./getentityname.py", line 339, in main
    ents = getentities(args)
  File "./getentityname.py", line 219, in getentities
    log.debug("Match groups: %s", match.groupdict())

None of these files makes problems with the old version of getentityname.py.

@tomschr
Copy link
Collaborator

tomschr commented Mar 26, 2021

@fsundermeyer I've updated the branch and fixed it in tomschr/getentities#4. It was actually quite simple. I've tested it with the above files and it works for me. Please try again.

The last file (schemas.xml) is not a DocBook file, but it works too.

Thanks for the report! 👍

@fsundermeyer
Copy link
Member Author

Ah, the sweetness of "programming languages" that do not care about being backwards compatible... .

> python3 --version
Python 3.8.8
> /local/git/daps/libexec/getentityname.py --version
2.1.0
> /local/git/daps/libexec/getentityname.py doc/xml/daps_user_intro.xml # DB4
entity-decl.ent /local/git/daps/doc/xml/phrases-decl.ent
> > /local/git/daps/libexec/getentityname.py ../doc-sle/xml/MAIN.SLEDS.xml
Traceback (most recent call last):
  File "/local/git/daps/libexec/getentityname.py", line 433, in <module>
    sys.exit(main())
  File "/local/git/daps/libexec/getentityname.py", line 420, in main
    ents = getentities(args)
  File "/local/git/daps/libexec/getentityname.py", line 316, in getentities
    resultdict[entity] = os.path.join(os.path.dirname(absolute), entity)
  File "/usr/lib64/python3.8/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.8/genericpath.py", line 152, in _check_arg_types
    raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

Two errors:

  • does not work with DocBook 5
  • with DocBook 4 one entity file is listed without path, the other one with an absolute path.

@tomschr
Copy link
Collaborator

tomschr commented Apr 8, 2021

Two errors: [...]

Okay, I've revamped the code and tried it with the DAPS User Guide:

$ getentityname.py --version
2.2.0

# DocBook 4
$ getentityname.py ~/repos/GH/opensuse/daps/doc/xml/daps_user_intro.xml 
/home/toms/repos/GH/opensuse/daps/doc/xml/entity-decl.ent /home/toms/repos/GH/opensuse/daps/doc/xml/phrases-decl.ent

# DocBook 5
$ getentityname.py ~/repos/GH/SUSE/doc-sle/xml/MAIN.SLEDS.xml 
/home/toms/repos/GH/SUSE/doc-sle/xml/entity-decl.ent /home/toms/repos/GH/SUSE/doc-sle/xml/network-decl.ent

@fsundermeyer Could you try it again, please? 🥺
https://github.com/tomschr/getentities/blob/bugfix/3-not_in_set.xml/bin/getentityname.py

@fsundermeyer
Copy link
Member Author

Great - works for me now! Thank you!
Minor issue:
If you specify a non-exiusting filename, there is an exception that is not caught:

> /local/git/daps/libexec/getentityname.py jdghfksjhf
> /local/git/daps/libexec/getentityname.py jdghfksjhf                                          [main|●1…1]
Traceback (most recent call last):
  File "/local/git/daps/libexec/getentityname.py", line 478, in <module>
    sys.exit(main())
  File "/local/git/daps/libexec/getentityname.py", line 465, in main
    ents = getentities(args)
  File "/local/git/daps/libexec/getentityname.py", line 328, in getentities
    xmlsyntaxcheck(xmlfile)
  File "/local/git/daps/libexec/getentityname.py", line 161, in xmlsyntaxcheck
    parser.parse(xmlfile)
  File "/usr/lib64/python3.8/xml/sax/expatreader.py", line 105, in parse
    source = saxutils.prepare_input_source(source)
  File "/usr/lib64/python3.8/xml/sax/saxutils.py", line 365, in prepare_input_source
    f = urllib.request.urlopen(source.getSystemId())
  File "/usr/lib64/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.8/urllib/request.py", line 509, in open
    req = Request(fullurl, data)
  File "/usr/lib64/python3.8/urllib/request.py", line 328, in __init__
    self.full_url = url
  File "/usr/lib64/python3.8/urllib/request.py", line 354, in full_url
    self._parse()
  File "/usr/lib64/python3.8/urllib/request.py", line 383, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: '/local/git/daps/jdghfksjhf'

@tomschr
Copy link
Collaborator

tomschr commented Apr 12, 2021

Great - works for me now! Thank you!

Ok, that's good. 👍 Thanks for the test.

Minor issue:
If you specify a non-exiusting filename, there is an exception that is not caught:
[...]

Okay, that shouldn't happen. I'll fix that too.

@tomschr
Copy link
Collaborator

tomschr commented Apr 12, 2021

@fsundermeyer Ok, I fixed this issue:

$ /bin/getentityname.py --version
2.2.1
$ ./bin/getentityname.py jdghfksjhf
[CRITICAL] main: File 'jdghfksjhf' not found

Every passed XML file is now checked if it exists. If not, the above critical error is reported.
Just pull/update the repo or download it from https://github.com/tomschr/getentities/blob/bugfix/3-not_in_set.xml/bin/getentityname.py

@fsundermeyer
Copy link
Member Author

Thank you very much! Added to DAPS with c66c206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants