You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "mwliststuff.py", line 5, in <module>
m = mwparserfromhell.parse(text)
File "/XXXX/venv/lib/python3.5/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
return Parser().parse(value, context, skip_style_tags)
File "/XXXX/venv/lib/python3.5/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.
Other potentially relevant details:
Python 3.5.1
Mac OS X 10.11.4
mwparserfromhell==0.4.3
The text was updated successfully, but these errors were encountered:
I finally figured this one out, and it was incredibly obscure. The wikicode is a bit of a mess, and has something that looks like an unpaired HTML tag (<Th) that the parser treats as such. The underlying parsing bug has been fixed, but we still get semi-garbage output (though it does match the input correctly and does not throw an exception). I think this is a GIGO case until the parser develops more accurate tag parsing abilities.
When attempting to parse the MediaWiki text of https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 using mwparserfromhell.parse I receive the following stack trace.
Other potentially relevant details:
The text was updated successfully, but these errors were encountered: