-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make doc2vec imdb ipynb tutorial run in python 2 and 3 #1220
Conversation
Please merge in develop into your branh to resolve the conflicts |
It seems that I select the wrong base branch. I have change to the develop branch and the conflicts were resolved. And the commit( 1aa3f33) was my operation mistake. |
@@ -92,8 +116,7 @@ | |||
" txt_files = glob.glob('/'.join([dirname, fol, '*.txt']))\n", | |||
"\n", | |||
" for txt in txt_files:\n", | |||
" with open(txt, 'r', encoding='utf-8') as t:\n", | |||
" control_chars = [chr(0x85)]\n", | |||
" with codecs.open(txt, 'r', encoding='utf-8') as t:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use smart_open
instead: drop codecs
, open files in binary mode and convert content to unicode explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will drop the codecs and move to smart_open.
@@ -104,21 +127,28 @@ | |||
" temp += \"\\n\"\n", | |||
"\n", | |||
" temp_norm = normalize_text(temp)\n", | |||
" with open('/'.join([dirname, output]), 'w', encoding='utf-8') as n:\n", | |||
" with codecs.open('/'.join([dirname, output]), 'w', encoding='utf-8') as n:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not portable -- please use os.path.join
.
" n.write(temp_norm)\n", | ||
"\n", | ||
" alldata += temp_norm\n", | ||
"\n", | ||
" with open('/'.join([dirname, 'alldata-id.txt']), 'w', encoding='utf-8') as f:\n", | ||
" with codecs.open('/'.join([dirname, 'alldata-id.txt']), 'w', encoding='utf-8') as f:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop codecs
, use binary mode.
fix the compatibility between python2 and python3 for the notebook of doc2vec-IMDB.ipynb #1139.