编程语言
首页 > 编程语言> > python – Pandas msgpack vs pickle

python – Pandas msgpack vs pickle

作者:互联网

Pandas中的msgpack应该是pickle的替代品.

Pandas docs on msgpack

This is a lightweight portable binary format, similar to binary JSON,
that is highly space efficient, and provides good performance both on
the writing (serialization), and reading (deserialization).

然而,我发现它的性能似乎与咸菜不相上下.

df = pd.DataFrame(np.random.randn(10000, 100))

>>> %timeit df.to_pickle('test.p')
10 loops, best of 3: 22.4 ms per loop

>>> %timeit df.to_msgpack('test.msg')
10 loops, best of 3: 36.4 ms per loop

>>> %timeit pd.read_pickle('test.p')
100 loops, best of 3: 10.5 ms per loop

>>> %timeit pd.read_msgpack('test.msg')
10 loops, best of 3: 24.6 ms per loop

问题:除了泡菜的潜在安全问题,msgpack对pickle有什么好处? pickle仍然是序列化数据的首选方法,还是目前存在更好的替代方案?

解决方法:

Pickle更适合以下情况:

>数值数据或任何使用缓冲区协议(numpy数组)的东西(尽管只有你使用了一些新近的协议=)
> Python特定对象,如类,函数等.(虽然在这里你应该看看cloudpickle)

MsgPack更适合以下情况:

>跨语言互操作.它是JSON的替代品,有一些改进
>文本数据和Python对象的性能.在任何情况下,这都是比Pickle更快的体面因素.

正如@Jeff所述,this blogpost以上可能会引起关注

标签:python,pandas,msgpack
来源: https://codeday.me/bug/20191004/1853747.html