Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C tokenizer exited with non-empty token stack #164

Closed
halfak opened this issue Sep 23, 2016 · 2 comments
Closed

C tokenizer exited with non-empty token stack #164

halfak opened this issue Sep 23, 2016 · 2 comments

Comments

@halfak
Copy link

halfak commented Sep 23, 2016

Error while processing 2013 AIBA World Boxing Championships – Light welterweight @ 726997736:

ParserError: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.
Traceback (most recent call last):
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/dependencies/functions.py", line 244, in _solve
    value = dependent(*args)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/dependencies/dependent.py", line 52, in __call__
    return self.process(*args, **kwargs)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/features/wikitext/datasources/parsed.py", line 210, in _process_wikicode
    return mwparserfromhell.parse(text)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.
$ python
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mwparserfromhell
>>> import mwapi
>>> mwparserfromhell.__version__
'0.4.3'
>>> text = mwapi.Session("https://en.wikipedia.org").get(action='query', prop='revisions', revids=726997736, rvprop=['content'], formatversion=2)['query']['pages'][0]['revisions'][0]['content']
Sending requests with default User-Agent.  Set 'user_agent' on mwapi.Session to quiet this message.
>>> wikicode = mwparserfromhell.parse(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.
@lahwaacz
Copy link
Contributor

Most likely a duplicate of #40, the same workaround (skip_style_tags=True) applies.

@earwig
Copy link
Owner

earwig commented Jun 23, 2017

Fixed in cd4f90e, same as #147 and #158. Slipped by me the first time around (probably because it has the same title as #158 :P)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants