首页 > 编程语言> > Python 2假定了不同的源代码编码

Python 2假定了不同的源代码编码

2019-10-03 04:08:12 作者：互联网

我注意到,如果没有源代码编码声明,Python 2解释器假定源代码使用脚本和标准输入以ASCII编码：

$python test.py  # where test.py holds the line: print u'é'
  File "test.py", line 1
SyntaxError: Non-ASCII character '\xc3' in file test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

$echo "print u'é'" | python
  File "/dev/fd/63", line 1
SyntaxError: Non-ASCII character '\xc3' in file /dev/fd/63 on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

它使用-m模块和-c命令标志在ISO-8859-1中编码：

$python -m test  # where test.py holds the line: print u'é'
Ã©

$python -c "print u'é'"
Ã©

它在哪里记录？

与Python 3相比,Python 3始终假设源代码以UTF-8编码,因此在四种情况下打印é.

注意. – 我在macOS 10.13和Ubuntu Linux 17.10上的CPython 2.7.14上测试了这个,控制台编码设置为UTF-8.

解决方法:

-c和-m开关最终(*)运行exec statement或compile() function提供的代码,两者都采用Latin-1源代码：

The first expression should evaluate to either a Unicode string, a Latin-1 encoded string, an open file object, a code object, or a tuple.

这没有记录,它是一个实现细节,可能会或可能不会被视为错误.

我不认为这是值得修复的东西,而Latin-1是ASCII的超集,所以很少丢失.如何处理来自-c和-m的代码已经在Python 3中进行了清理,并且在那里更加一致;使用当前语言环境传入使用-c传入的代码,并且像往常一样使用-m开关加载的模块默认为UTF-8.

(*)如果您想知道所使用的确切实现,请从Py_Main() function in Modules/main.c开始,它将-c和-m都处理为：

if (command) {
    sts = PyRun_SimpleStringFlags(command, &cf) != 0;
    free(command);
} else if (module) {
    sts = RunModule(module, 1);
    free(module);
}

> -c通过PyRun_SimpleStringFlags() function执行,PyRun_SimpleStringFlags() function又调用PyRun_StringFlags().当你使用exec时,bytestring对象也传递给PyRun_StringFlags(),然后假定源代码包含Latin-1编码的字节.
> -m使用RunModule() function将模块名称传递给runpy module中的private function _run_module_as_main(),它使用pkgutil.get_loader()加载模块元数据,并使用PEP 302 loader上的loader.get_code()函数获取模块代码对象.如果没有缓存的字节码可用,那么code object is produced by using the compile() function的模式设置为exec.

标签：iso-8859-1,python,character-encoding,ascii,python-internals
来源： https://codeday.me/bug/20191003/1846738.html