Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched italics syntax causes template filter to fail sometimes #202

Closed
mzmcbride opened this issue Aug 30, 2018 · 1 comment
Closed

Comments

@mzmcbride
Copy link

When there's mismatched italics syntax, the .filter_templates() method fails to properly parse the page sometimes.

Sample script:

#! /usr/bin/env python

import mwparserfromhell

def parse_text(case_text):
    parsed_page_text = mwparserfromhell.parse(case_text)
    print(len(parsed_page_text.filter_templates()))
    for template in parsed_page_text.filter_templates():
        print(template.name.strip())

case_text = """\
{{Infobox SCOTUS case
  |FullName=''[[et vir]]'
}}

'''''Hello there'''''
"""

parse_text(case_text)

case_text = """\
{{Infobox SCOTUS case
  |FullName=''[[et vir]]''
}}

'''''Hello there'''''
"""

parse_text(case_text)

case_text = """\
{{Infobox SCOTUS case
  |FullName=''[[et vir]]'
}}
"""

parse_text(case_text)

Of note: |FullName=''[[et vir]]' is mismatched in the first case and the third case.

Current buggy output:

0
1
Infobox SCOTUS case
1
Infobox SCOTUS case

Expected output:

1
Infobox SCOTUS case
1
Infobox SCOTUS case
1
Infobox SCOTUS case
@lahwaacz
Copy link
Contributor

Duplicate of #40. Use mwparserfromhell.parse() with skip_style_tags=True as a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants