Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does this support Chinese word doc/docx file #6

Open
xwydq opened this issue Feb 3, 2017 · 1 comment
Open

does this support Chinese word doc/docx file #6

xwydq opened this issue Feb 3, 2017 · 1 comment

Comments

@xwydq
Copy link

xwydq commented Feb 3, 2017

my docx file contain chinese like
四、我们确认,我们完全同意招标文件制定的投标规则,并承诺按照这些规则履行我们的所有义务,包括一旦投标文件被贵方接受,将履行社会资本合作方的义务

in my mac, i used doc_ripper and the result shows below

➜  ~ irb
irb(main):001:0> require 'doc_ripper'
=> true
irb(main):002:0> DocRipper::rip('/Users/datSource/test/docx1.docx')
=> "ç\u009B® å½\u0095 TOC \\o \"1-4\" \\h \\z \\u ä¸\u0080ã\u0080\u0081æ\u008A\u0095èµ\u0084ç\u0094³è¯·ä¹¦ PAGEREF _Toc448258241 \\h 2äº\u008Cã\u0080\u0081æ\u008E\u0088æ\u009D\u0083å§\u0094æ\u0089\u0098书 PAGEREF _Toc448258242 \\h 5ä¸\u0089ã\u0080\u0081å¼\u0080æ \u0087ä¸\u0080è§\u0088表 PAGEREF _Toc448258243 \\h 6å\u009B\u009Bã\u0080\u0081è¯\u0084å\u0088\u0086ç´

how can i get the right plain text

thks!!

@pzaich
Copy link
Owner

pzaich commented Feb 15, 2017

Hi @xwydq Can you provide me with a sample document that I can use as a reference? Most likely this has to do with some assumptions that were made around enforcing encoding.

@pzaich pzaich mentioned this issue Nov 29, 2017
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants