首页 > 系统相关> > 带有更多内存的服务器上的Pandas MemoryError

带有更多内存的服务器上的Pandas MemoryError

2019-11-20 10:56:27 作者：互联网

我有一种在2个不同系统上以不同方式处理大熊猫数据框的方法.尝试加载和使用特定的源CSV时,我在具有16GB RAM的Windows Server计算机上收到内存错误,但在只有12GB RAM的本地计算机上却没有

def load_table(self, name, source_folder="", columns=None):
    """Load a table from memory or csv by name.

    loads a table from memory or csv. if loaded from csv saves the result
    table to the temporary list. An explicit call to save_table is
    necessary if the results want to survive clearing temporary storage
    @param string name the name of the table to load
    @param string sourceFolder the folder to look for the csv if the table
        is not already in memory
    @return DataFrame returns a DataFrame representing the table if found.
    @raises IOError if table cannot be loaded
    """
    #using copy in these first two to avoid modification of existing data
    #without an explicit save_table
    if name in self.tables:
        result = self.tables[name].copy()
    elif name in self.temp_tables:
        result = self.temp_tables[name].copy()
    elif os.path.isfile(name+".csv"):
        data_frame = pd.read_csv(name+".csv", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(name+".xlsx"):
        data_frame = pd.read_excel(name+".xlsx", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(source_folder+name+".csv"):
        data_frame = pd.read_csv(source_folder+name+".csv", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame
    elif os.path.isfile(source_folder+name+".xlsx"):
        data_frame = pd.read_excel(source_folder+name+".xlsx", encoding="utf-8")
        self.save_temp(data_frame, name)
        result = data_frame

和save_temp是这样的：

def save_temp(self, data_frame, name):
        """ save a table to the temporary storage

        @param DataFrame data_frame, the data frame we are storing
        @param string name, the key to index this value
        @throws ValueError throws an error if the data frame is empty
        """
        if data_frame.empty:
            raise ValueError("The data frame passed was empty", name, data_frame)
        self.temp_tables[name] = data_frame.copy()

有时在我尝试在交互式解释器中尝试手动加载此文件的read_csv上发生memoryError,该文件可以正常工作,然后将其保存到此处引用的表字典中.然后尝试在副本上执行load_table错误.

采取手动加载的数据帧并对其调用.copy()也会产生一个MemoryError,在服务器盒上没有文本,但不在本地.

服务器计算机正在运行Windows Server 2012 R2,而我的本地计算机是Windows 7

两者都是64位计算机

服务器为2.20 GHz,带有2个处理器,而我的本地计算机为3.4 GHz
服务器：16GB RAM
本地：12GB RAM

将.copy()更改为.copy(False)可以使代码在服务器计算机上运行,但不能回答为什么它首先在具有更多内存的计算机上出现MemoryError的问题.

编辑添加：
两者都在使用
熊猫：0.16.0
numpy的：1.9.2
该服务器显然使用32位python,而我的本地计算机是64位
两者均为2.7.8

解决方法:

因此,您的问题是,尽管使用了相同版本的熊猫和64位操作系统,但是您拥有32位python,内存限制为2gb.

标签：pandas,memory,deep-copy,windows-server-2012-r2,python
来源： https://codeday.me/bug/20191120/2043608.html