Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text is not parsed as expected #88

Closed
5j9 opened this issue Dec 12, 2014 · 2 comments
Closed

Text is not parsed as expected #88

5j9 opened this issue Dec 12, 2014 · 2 comments

Comments

@5j9
Copy link

5j9 commented Dec 12, 2014

Shouldn't parsing the following code generate some tags and templates?

>>> import mwparserfromhell as mwp
>>> # from https://en.wikipedia.org/w/index.php?title=Template:Geobox2_list&action=edit
>>> p = mwp.parse("""<includeonly>{{#if: {{{3|}}}{{{4|}}} | title="{{#if:{{{4|}}} | {{{4|}}} | {{{3|}}} }}"}}
{{#if: {{{sub|}}} | {{!}} | ! }} style="{{{style|white-space: nowrap;}}}" {{!}} {{#if: {{{sub|}}} | &nbsp;-&nbsp; }}{{#if: {{{2|}}} | {{{2|}}} | {{{1|}}} }}
{{!}}<span style="white-space: nowrap">{{ {{#if:{{{5|}}}|Geobox2 link|Geobox 0}}|{{{5|}}}}}{{#if: {{{6|}}} |</span> <span style="white-space: nowrap">''{{{6|}}}''}}{{#if: {{{7|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{7|}}}|Geobox2 link|Geobox 0}}|{{{7|}}}}}{{#if: {{{8|}}} |</span> <span style="white-space: nowrap">''{{{8|}}}''}}{{#if: {{{9|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{9|}}}|Geobox2 link|Geobox 0}}|{{{9|}}}}}{{#if: {{{10|}}}|</span> <span style="white-space: nowrap">''{{{10|}}}''}}{{#if: {{{11|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{11|}}}|Geobox2 link|Geobox 0}}|{{{11|}}}}}{{#if: {{{12|}}}|</span> <span style="white-space: nowrap">''{{{12|}}}''}}{{#if: {{{13|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{13|}}}|Geobox2 link|Geobox 0}}|{{{13|}}}}}{{#if: {{{14|}}}|</span> <span style="white-space: nowrap">''{{{14|}}}''}}{{#if: {{{15|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{15|}}}|Geobox2 link|Geobox 0}}|{{{15|}}}}}{{#if: {{{16|}}}|</span> <span style="white-space: nowrap">''{{{16|}}}''}}{{#if: {{{17|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{17|}}}|Geobox2 link|Geobox 0}}|{{{17|}}}}}{{#if: {{{18|}}}|</span> <span style="white-space: nowrap">''{{{18|}}}''}}{{#if: {{{19|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{19|}}}|Geobox2 link|Geobox 0}}|{{{19|}}}}}{{#if: {{{20|}}}|</span> <span style="white-space: nowrap">''{{{20|}}}''}}{{#if: {{{21|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{21|}}}|Geobox2 link|Geobox 0}}|{{{21|}}}}}{{#if: {{{22|}}}|</span> <span style="white-space: nowrap">''{{{22|}}}''}}{{#if: {{{23|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{23|}}}|Geobox2 link|Geobox 0}}|{{{23|}}}}}{{#if: {{{24|}}}|</span> <span style="white-space: nowrap">''{{{24|}}}''}}{{#if: {{{25|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{25|}}}|Geobox2 link|Geobox 0}}|{{{25|}}}}}{{#if: {{{26|}}}|</span> <span style="white-space: nowrap">''{{{26|}}}''}}{{#if: {{{27|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{27|}}}|Geobox2 link|Geobox 0}}|{{{27|}}}}}{{#if: {{{28|}}}|</span> <span style="white-space: nowrap">''{{{28|}}}''}}{{#if: {{{29|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{29|}}}|Geobox2 link|Geobox 0}}|{{{29|}}}}}{{#if: {{{30|}}}|</span> <span style="white-space: nowrap">''{{{30|}}}''}}{{#if: {{{31|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{31|}}}|Geobox2 link|Geobox 0}}|{{{31|}}}}}{{#if: {{{32|}}}|</span> <span style="white-space: nowrap">''{{{32|}}}''}}{{#if: {{{33|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{33|}}}|Geobox2 link|Geobox 0}}|{{{33|}}}}}{{#if: {{{34|}}}|</span> <span style="white-space: nowrap">''{{{34|}}}''}}{{#if: {{{35|}}}
|,</span> <span style="white-space: nowrap">{{ {{#if:{{{35|}}}|Geobox2 link|Geobox 0}}|{{{35|}}}}}{{#if: {{{36|}}}|</span> <span style="white-space: nowrap">''{{{36|}}}''}} }} }} }} }} }} }} }} }} }} }} }} }} }} }} }}</span>
|- class="mergedrow" </includeonly><noinclude>{{pp-template|small=yes}}[[Category:Geobox2 include|list]]</noinclude>
""")
>>> p.filter_templates()
[]
>>> p.filter_tags()
[]
>>> p.filter_wikilinks()
[]

Specifically I was expecting ['{{pp-template|small=yes}}'] from p.filter_templates().
(I'm using Python 3.4.2 on win32)

@lmorillas
Copy link

This should work

>>> p.filter_templates(p.RECURSE_OTHERS)

But it's raising an error with py 2.7

... /python2.7/site-packages/mwparserfromhell/string_mixin.pyc in __getattr__(self, attr)
    110 
    111     def __getattr__(self, attr):
--> 112         return getattr(self.__unicode__(), attr)
    113 
    114     if py3k:

AttributeError: 'unicode' object has no attribute 'RECURSE_OTHERS'

@earwig earwig self-assigned this Dec 12, 2014
@earwig
Copy link
Owner

earwig commented Dec 12, 2014

@lmorillas It's raising an error because you aren't using the latest version of the code. RECURSE_OTHERS is to be added in 0.4, which isn't released yet, so you'll need to use the version on the develop branch to use it. Either way, this isn't the problem, because filter_templates() should recurse fully by default.

@IRDB Unfortunately, this issue is another side-effect of #40 and #42. The presence of <span> tags that cross the boundaries of {{#if}}s is very confusing to the parser, so it ends up spending a lot of time trying to figure this out, exceeding its self-imposed parsing time limit, and causing it to skip the template and other wikicode at the end of the page. When I get to fixing #42, the template you want will be visible, but it won't be until I fix the monster that is #40 before it "correctly" parses this page in full.

@earwig earwig closed this as completed Dec 12, 2014
@earwig earwig removed their assignment Dec 30, 2016
earwig added a commit that referenced this issue Jun 23, 2017
Also removed the max cycles stop-gap, allowing much more complex pages
to be parsed quickly without losing nodes at the end

Also fixes #65, fixes #102, fixes #165, fixes #183
Also fixes #81 (Rafael Nadal parsing bug)
Also fixes #53, fixes #58, fixes #88, fixes #152 (duplicate issues)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants