Fix unicode print error by melynx · Pull Request #96 · google/seq2seq

melynx · 2017-03-22T04:36:18Z

Convert unicode string to byte string before printing.

dennybritz · 2017-03-22T16:30:54Z

Hm, thanks for the PR, but I'm not sure if this is 100% correct. For me, your change results in the following being printed when I run pipeline_test, using Python 3.

b'\xe6\xb3\xa3'

It worked without your change. I'm not sure what the right approach to make it work on all platform is. Python unicode is a mystery to me.

dennybritz · 2017-03-22T16:56:31Z

Can you try setting: export PYTHONIOENCODING=UTF-8 as an environment variable? Does that solve your issue?

melynx · 2017-03-22T17:07:39Z

Ah... that's because the encode converts it to a byte string.
Yes. But isn't it better if the code workes even if the environment variable is not set?

My original fix was actually to set the default encoding of sys but I'm not sure if this is a good fix or not, that's why I've change it to io.open() for PR #93 .

import sys
reload(sys)  
sys.setdefaultencoding('utf8')

This seems very hackish and might break stuff which assumes ascii encoding.

dennybritz · 2017-03-22T17:13:08Z

Yes. But isn't it better if the code workes even if the environment variable is not set?

Definitely, but your change does not work in my environment ;) It should print a unicode string, but it doesn't. I am still not sure how to make it work in all environments.

Yeah, the original fix isn't great, also see http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script

melynx · 2017-03-22T17:29:43Z

Yup I agree, I'm thinking of writing a separate Unicode printing function but I'm not sure if its worth the effort to change all the prints to the custom version. XD

dennybritz · 2017-03-22T17:32:42Z

Yeah. There must be an easier/correct way to do this, I just don't know what it is...

pltrdy · 2017-03-27T15:15:58Z

I'm not sure how off-topic it is but, did you consider using unicode_literals (future) ?

dennybritz · 2017-03-28T18:57:17Z

I think unicode_literals is imported pretty much everywhere, but I don't think that's related since it only applies to literals defined in the code.

darrengarvey · 2017-06-11T17:24:36Z

You might want to try using tf.compat.as_text(), which deals with this Python versioning ugliness.

sheerun · 2017-07-09T13:36:13Z

I had issue with encoding when using ./bin/tools/generate_vocab.py

the solutions seems to be to use python3 ./bin/tools/generate_vocab.py instead.. probably the same for whole seq2seq training...

Fix unicode print error

de71479

Convert unicode string to byte string before printing.

dennybritz mentioned this pull request Mar 23, 2017

find UnicodeEncodeError when run command:DATA_TYPE=reverse ./bin/data/toy.sh #87

Closed

okuchaiev mentioned this pull request Mar 29, 2017

UnicodeEncodeError when doing En-De prediction #128

Closed

This was referenced Apr 17, 2017

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 98: unexpected end of data #170

Closed

Handling special characters for character seq2seq model #153

Open

SwordYork mentioned this pull request Apr 19, 2017

update broken links, wrap stdout for python 2 and warning for character-level models #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unicode print error#96

Fix unicode print error#96
melynx wants to merge 1 commit intogoogle:masterfrom
melynx:master

melynx commented Mar 22, 2017

Uh oh!

dennybritz commented Mar 22, 2017 •

edited

Loading

Uh oh!

dennybritz commented Mar 22, 2017

Uh oh!

melynx commented Mar 22, 2017 •

edited

Loading

Uh oh!

dennybritz commented Mar 22, 2017

Uh oh!

melynx commented Mar 22, 2017

Uh oh!

dennybritz commented Mar 22, 2017

Uh oh!

pltrdy commented Mar 27, 2017

Uh oh!

dennybritz commented Mar 28, 2017

Uh oh!

darrengarvey commented Jun 11, 2017

Uh oh!

sheerun commented Jul 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

melynx commented Mar 22, 2017

Uh oh!

dennybritz commented Mar 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dennybritz commented Mar 22, 2017

Uh oh!

melynx commented Mar 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dennybritz commented Mar 22, 2017

Uh oh!

melynx commented Mar 22, 2017

Uh oh!

dennybritz commented Mar 22, 2017

Uh oh!

pltrdy commented Mar 27, 2017

Uh oh!

dennybritz commented Mar 28, 2017

Uh oh!

darrengarvey commented Jun 11, 2017

Uh oh!

sheerun commented Jul 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dennybritz commented Mar 22, 2017 •

edited

Loading

melynx commented Mar 22, 2017 •

edited

Loading