Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C tokenizer exited with BAD_ROUTE. #165

Closed
halfak opened this issue Sep 23, 2016 · 1 comment
Closed

C tokenizer exited with BAD_ROUTE. #165

halfak opened this issue Sep 23, 2016 · 1 comment
Assignees

Comments

@halfak
Copy link

halfak commented Sep 23, 2016

Error while processing 2006 French Open – Girls' Singles @ 685719491:

ParserError: This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE.
Traceback (most recent call last):
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/dependencies/functions.py", line 244, in _solve
    value = dependent(*args)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/dependencies/dependent.py", line 52, in __call__
    return self.process(*args, **kwargs)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring/features/wikitext/datasources/parsed.py", line 210, in _process_wikicode
    return mwparserfromhell.parse(text)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE.
$ python
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mwparserfromhell
>>> mwparserfromhell.__version__
'0.4.3'
>>> import mwapi
>>> text = mwapi.Session("https://en.wikipedia.org").get(action='query', prop='revisions', revids=685719491, rvprop=['content'], formatversion=2)['query']['pages'][0]['revisions'][0]['content']
Sending requests with default User-Agent.  Set 'user_agent' on mwapi.Session to quiet this message.
>>> wikicode = mwparserfromhell.parse(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/utils.py", line 58, in parse_anything
    return Parser().parse(value, context, skip_style_tags)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/mwparserfromhell/parser/__init__.py", line 93, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
mwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE.
@lahwaacz
Copy link
Contributor

Most likely a duplicate of #40 or #65, the skip_style_tags=True workaround works here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants