ParserError on https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 #147

fabianhoward · 2016-04-12T09:28:11Z

When attempting to parse the MediaWiki text of https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 using mwparserfromhell.parse I receive the following stack trace.

Traceback (most recent call last):
  File "mwliststuff.py", line 5, in <module>
    m = mwparserfromhell.parse(text)
  File "/XXXX/venv/lib/python3.5/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/XXXX/venv/lib/python3.5/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.

Other potentially relevant details:

Python 3.5.1
Mac OS X 10.11.4
mwparserfromhell==0.4.3

The text was updated successfully, but these errors were encountered:

earwig · 2017-06-23T08:13:52Z

I finally figured this one out, and it was incredibly obscure. The wikicode is a bit of a mess, and has something that looks like an unpaired HTML tag (<Th) that the parser treats as such. The underlying parsing bug has been fixed, but we still get semi-garbage output (though it does match the input correctly and does not throw an exception). I think this is a GIGO case until the parser develops more accurate tag parsing abilities.

earwig · 2017-06-23T08:14:35Z

Fixed in cd4f90e.

earwig closed this as completed Jun 23, 2017

earwig added the result: fixed label Jun 23, 2017

earwig added this to the version 0.5 milestone Jun 23, 2017

earwig self-assigned this Jun 23, 2017

This was referenced Jun 23, 2017

C tokenizer exited with non-empty token stack. #158

Closed

C tokenizer exited with non-empty token stack #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParserError on https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 #147

ParserError on https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 #147

fabianhoward commented Apr 12, 2016

earwig commented Jun 23, 2017

earwig commented Jun 23, 2017

ParserError on https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 #147

ParserError on https://en.wikipedia.org/w/index.php?title=List_of_minerals_(synonyms)&oldid=704783982 #147

Comments

fabianhoward commented Apr 12, 2016

earwig commented Jun 23, 2017

earwig commented Jun 23, 2017