Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on pages with tables #148

Closed
vladiscripts opened this issue Apr 17, 2016 · 3 comments
Closed

Error on pages with tables #148

vladiscripts opened this issue Apr 17, 2016 · 3 comments

Comments

@vladiscripts
Copy link

On pages with tables parser don't find templates, and has writes in tables the bad char '|'.
E.g. page: https://ru.wikipedia.org/wiki/%D0%A8%D0%BF%D0%B8%D1%86%D0%B1%D0%B5%D1%80%D0%B3%D0%B5%D0%BD When parser run on text of this page with the code

import mwparserfromhell
f = open('page.txt', 'r', encoding='utf-8')
text = f.read()
f.close()
code = mwparserfromhell.parse(text)
for template in code.filter_templates():
    print(template)

then be print only a few recognized templates, e.g. be skip all bibliographic templates.

And this code add in tables '|' like:

{| style="float:right" class="standard" → {|| style="float:right" class="standard"
{|class="wikitable" style="text-align:right" → **{||**class="wikitable" style="text-align:right"

if save the file:

f = open('page.txt', 'w', encoding='utf-8')
f.write(str(code))
f.close()
@earwig
Copy link
Owner

earwig commented Apr 17, 2016

This was fixed in 61b6b98. Try using the develop branch.

@vladiscripts
Copy link
Author

Still dont works. '|' doesn't add, but still doesn't see templates.

@earwig
Copy link
Owner

earwig commented Apr 18, 2016

Looks like #40 due to constructs like:

|<center>14 443

No fix yet, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants