Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore DTDs #3

Closed
superdude264 opened this issue Jan 28, 2021 · 2 comments
Closed

Ignore DTDs #3

superdude264 opened this issue Jan 28, 2021 · 2 comments

Comments

@superdude264
Copy link

In trying to get Pub2TEI working on the grobid gold standard data from PMC, I ran into the DTD issues mentioned in the README. After some research, I was able to discover that DTD loading can be disabled with the following switch:

--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false

References:

I've attached a file from the grobid PMC gold standard data I was having trouble with. The new switch allows the conversion to proceed.

sample.zip


The sample command in the README could be updated to:

java -jar Samples/saxon9he.jar \
	--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false \
	-a:off \
	-dtd:off \
	-expand:off \
	-o:out.tei.xml \
	-s:Samples/TestPubInput/BMJ/bmj_sample.xml \
	-t \
	-xsl:Stylesheets/Publishers.xsl
kermitt2 added a commit that referenced this issue May 4, 2023
@kermitt2
Copy link
Owner

kermitt2 commented May 4, 2023

Thanks you very much @superdude264 for the info! And sorry for answering only now (I overlooked your issue, and although I am using Pub2TEI quite frequently, it does not require a lot of updates).

This is very useful, I've updated the readme.

@superdude264
Copy link
Author

No problem, and thank you for the shout out. Docs look good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants