Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing fails with BadRoute exception #65

Closed
marciof opened this issue Mar 24, 2014 · 5 comments
Closed

Parsing fails with BadRoute exception #65

marciof opened this issue Mar 24, 2014 · 5 comments

Comments

@marciof
Copy link

marciof commented Mar 24, 2014

For these 3 Wikipedia articles: https://gist.github.com/marciof/a456f9c2df404db8b1f8

Traceback (most recent call last):
  (...)
  File "/home/marciof/py2.6/lib/python2.6/site-packages/mwparserfromhell/parser/__init__.py", line 62, in parse
    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)
  File "/home/marciof/py2.6/lib/python2.6/site-packages/mwparserfromhell/parser/tokenizer.py", line 1149, in tokenize
    return self._parse(context)
  File "/home/marciof/py2.6/lib/python2.6/site-packages/mwparserfromhell/parser/tokenizer.py", line 1125, in _parse
    return self._handle_tag_close_close()
  File "/home/marciof/py2.6/lib/python2.6/site-packages/mwparserfromhell/parser/tokenizer.py", line 724, in _handle_tag_close_close
    self._fail_route()
  File "/home/marciof/py2.6/lib/python2.6/site-packages/mwparserfromhell/parser/tokenizer.py", line 141, in _fail_route
    raise BadRoute(context)
mwparserfromhell.parser.tokenizer.BadRoute
@earwig earwig added this to the version 0.4 milestone Mar 24, 2014
@earwig earwig self-assigned this Mar 24, 2014
@earwig
Copy link
Owner

earwig commented Mar 24, 2014

Looking into it...

@earwig
Copy link
Owner

earwig commented Mar 24, 2014

Ugh, okay; this wikicode is a complete mess. The issue seems to be a result of a few factors coming together that appear to be mentioned in #40 and #42 already. It ultimately boils down to hitting the recursion limit and screwing up with handling the wikicode past that point. #42 will make the parser a lot faster, which would allow me to raise the recursion limits, so I'm hoping this sort of issue can be avoided through that.

@earwig
Copy link
Owner

earwig commented May 25, 2014

We really need to come up with another system for handling HTML tags. They're very problematic.

@earwig earwig modified the milestones: version 1.0, version 0.4 May 23, 2015
@earwig
Copy link
Owner

earwig commented May 23, 2015

Hopefully this should be fixed by the final solution to #40, but I'm leaving it open to verify that it is indeed fixed.

@lahwaacz
Copy link
Contributor

To avoid duplicate reports, it might be a good idea to temporarily change the error message from This is a bug and should be reported. Info: C tokenizer exited with BAD_ROUTE. to something like

This is a known bug (#40 or #65). As a workaround, try to pass `skip_style_tags=True` in the meantime.
Info: C tokenizer exited with BAD_ROUTE.

As a bonus, you might check if skip_style_tags is already true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants