首页 > 编程语言> > Python subprocess 模块

Python subprocess 模块

2022-05-12 13:01:43 作者：互联网

subprocess最早在2.4版本引入。用来生成子进程，通过管道来与他们的输入/输出/错误进行交互。

因为是在标准库的，并且是python 实现的，我们可以直接在 python 安装目录中找到他。（python 安装目录 \Lib\subprocess.py）

如果其他你想看的代码，你也可以去对应路径找一找。直接看源码比到处搜罗信息靠谱多了。

subprocess 用来替换多个旧模块和函数：

os.system
os.spawn*
os.popen*
popen2.*
commands.*

subprocess模块与原来接口 os.popen2 的区别

(child_stdin, child_stdout) = os.popen2("cmd", mode, bufsize)
==>
p = Popen("cmd", shell=True, bufsize=bufsize,
          stdin=PIPE, stdout=PIPE, close_fds=True)
(child_stdin, child_stdout) = (p.stdin, p.stdout)

参数更多意味着考虑更全面，你能做的事情也更多，能更深度的参与执行过程，与进程进行更复杂的交互。

subprocess模块中的核心类: Popen。

它的构造函数如下：

subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, 
                 preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, 
                 startupinfo=None, creationflags=0)

参数args可以是字符串或者序列类型（如：list，元组），用于指定进程的可执行文件及其参数。如果是序列类型，第一个元素通常是可执行文件的路径。我们也可以显式的使用executeable参数来指定可执行文件的路径。

参数bufsize 指定 buf 大小，buf 为0 意味着无缓冲。buf 为1 意味着行缓冲，其他正数意味着缓冲区大小。负数意味着系统默认的全缓冲。默认缓冲区大小跟不同系统运行时实现有关。

为了使程序的运行效率最高，流对象通常会提供缓冲区，以减少调用系统I/O接口的调用次数。
缓冲方式存在三种，分别是：
 （1）全缓冲。输入或输出缓冲区被填满，会进行实际I/O操作。其他情况，如强制刷新、进程结束也会进行实际I/O操作。
对于读操作来说，当读入内容的字节数等于缓冲区大小或者文件已经到达结尾，或者强制刷新，会进行实际的I/O操作，将外存文件内容读入缓冲区；
对于写操作来说，当缓冲区被填满或者强制刷新，会进行实际的I/O操作，缓冲区内容写到外存文件中。磁盘文件操作通常是全缓冲的。
（2）行缓冲。输入或输出缓冲区遇到换行符会进行实际I/O操作。其他与全缓冲相同。
（3）无缓冲。没有缓冲区，数据会立即读入内存或者输出到外存文件和设备上。标准错误输出stderr是无缓冲的，这样能够保证错误信息及时反馈给用户，供用户排除错误。

简单点理解就是
行缓冲：把用户数据存入缓冲区，遇到换行符'\n'则先把缓冲区数据打印到屏幕上；
全缓冲：把用户数据存入缓冲区，缓冲区满了则先把缓冲区数据打印到屏幕上；
无缓冲：直接把用户数据打印到屏幕上。

参数stdin, stdout, stderr分别表示程序的标准输入、输出、错误句柄。他们可以是PIPE，文件描述符或文件对象，也可以设置为None，表示从父进程继承。

参数preexec_fn 表示可以在子进程执行之前调用的方法。

参数close_fds windows 上不支持，先跳过。

如果参数shell设为true，程序将通过shell来执行。

参数cwd 子进程工作目录。

参数env是字典类型，用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。

参数universal_newlines 如果设置为True, 各种不同的系统上的回车换行符 '\n'，'\r\n','\r' 都会被替换成'\n'，这里应该主要影响的是缓冲区。

参数startupinfo 和 creationflags 会被传递给 CreateProcess,用来指定子进程的相关参数。比如进程优先级如果子进程有窗体则可以用来指定窗体的外观等

subprocess.PIPE

　　在创建Popen对象时，subprocess.PIPE可以初始化stdin, stdout或stderr参数。表示与子进程通信的标准流。

subprocess.STDOUT,subprocess.STDIN,subprocess.STDERR

　　创建Popen对象时，用于初始化相应的stdin,stdout,stderr 等参数，表示将错误通过标准输出流输出。

相关方法：

1、poll() #定时检查命令有没有执行完毕，执行完毕后设置并返回returncode属性，没有执行完毕返回None

>>> res = subprocess.Popen("sleep 10;echo 'hello'",shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 
>>> print(res.poll()) 
None 
>>> print(res.poll()) 
None 
>>> print(res.poll()) 
0

2、wait() #等待命令执行完成，并且返回结果状态

>>> obj = subprocess.Popen("sleep 10;echo 'hello'",shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 
>>> obj.wait() # 中间会一直等待

3、communicate(input=None) #与进程交互向stdin写数据从stdout stderr 读数据等待命令执行完成，返回stdout和stderr 元组。

对于 wait() 官方提示:
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe 
such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
即当stdout/stdin设置为PIPE时，使用wait()可能会导致死锁。因而建议使用communicate。

而对于communicate，文档又给出：
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. 
Similarly, to get anything other thanNone in the result tuple, you need to give stdout=PIPE and/or stderr=PIPE too.
Note:The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
communicate会把数据读入内存缓存下来，所以当数据很大或者是无限的数据时不要使用。

那么问题来了：
当你使用Python的subprocess.Popen实现命令行之间的管道传输，同时数据源又非常大（比如读取上GB的文本或者无尽的网络流）时，
官方文档不建议用wait，同时communicate还可能把内存撑爆,我们该怎么操作？

来自：https://zhuanlan.zhihu.com/p/430904623

对于这个问题，查看communicate()方法会发现，communicate() 终究会调用 wait() 来等待进程结束,区别在于 communicate 会主动通过stdout.read()来及时读取输出，只不过是读到内存里。

那么我们的解决方案就来了，不用管道，直接写入临时文件就完了啊

from subprocess import Popen
from tempfile import TemporaryFile
with TemporaryFile(mode='w+') as f:  # 使用临时文件保存输出结果
    with Popen('result.txt', shell=True, stdout=f, stderr=subprocess.STDOUT) as proc:
        status = proc.wait()

4、terminate() #结束进程在windows平台下，该方法将调用Windows API TerminateProcess（）来结束子进程。

import subprocess 
>>> res = subprocess.Popen("sleep 20;echo 'hello'",shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 
>>> res.terminate() # 结束进程

5、pid #获取当前执行子shell的程序的pid

import subprocess 
>>> res = subprocess.Popen("sleep 5;echo 'hello'",shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 
>>> res.pid # 获取这个linux shell 的 进程号 2778

6、Popen.returncode #获取进程的返回值。如果进程还没有结束，返回None。

7、Popen.send_signal(signal) #向子进程发送信号。调用 os.kill(signal) 来结束进程

8、Popen.kill() #杀死子进程。实际上就是调用 send_signal(signal.SIGKILL)

subprocess 的快捷使用的方法

call(*popenargs, **kwargs): 、check_call(*popenargs, **kwargs):、check_output(*popenargs, **kwargs):

这三个方法归根结底都是调用 subprocess.Popen() ，不同的是对于调用之后结果的处理

def call(*popenargs, **kwargs):
    return Popen(*popenargs, **kwargs).wait()

def check_output(*popenargs, **kwargs):
    if 'stdout' in kwargs:
        raise ValueError('stdout argument not allowed, it will be overridden.')
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
    output, unused_err = process.communicate()
    retcode = process.poll()
    if retcode:
        cmd = kwargs.get("args")
        if cmd is None:
            cmd = popenargs[0]
        raise CalledProcessError(retcode, cmd, output=output)
    return output

在安全性上讲也是，在源码中是这么说的：

Unlike some other popen functions, this implementation will never call
/bin/sh implicitly.  This means that all characters, including shell
metacharacters, can safely be passed to child processes.

就是说 subprocess 不会直接调用 /bin/sh, 所有字符都会经过安全的处理才会传递给子进程。

如果要在子进程刷新其stdout缓冲区后逐行获取子进程的输出：

#!/usr/bin/env 
from subprocess import Popen, PIPE 
with Popen(["python test.py", "args"], stdout=PIPE, bufsize=1, universal_newlines=True) as p: 
    for line in p.stdout: 
        print(line, end='!')

我的python版本

Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32

标签：None,stdout,Python,Popen,subprocess,PIPE,模块,缓冲区
来源： https://www.cnblogs.com/lesten/p/Python.html