首页 > 编程语言> > 使用Python获取NetCDF变量min / max的最快方法？

使用Python获取NetCDF变量min / max的最快方法？

2019-07-08 21:58:18 作者：互联网

与scipy.io.netcdf相比,我切换到netCDF4 Python模块时,从NetCDF文件中提取变量数据值的最小值/最大值的常用方法是一个较慢的数量级.

我正在使用相对较大的海洋模型输出文件(来自ROMS),在给定的地图区域(夏威夷)上具有多个深度级别.当这些在NetCDF-3中时,我使用了scipy.io.netcdf.

现在这些文件都在NetCDF-4(“经典”)中,我不能再使用scipy.io.netcdf而是转而使用netCDF4 Python模块.然而,缓慢是一个问题,我想知道是否有一种更有效的方法来提取变量的数据范围(最小和最大数据值)？

这是我使用scipy的NetCDF-3方法：

import scipy.io.netcdf
netcdf = scipy.io.netcdf.netcdf_file(file)
var = netcdf.variables['sea_water_potential_temperature']
min = var.data.min()
max = var.data.max()

这是我使用netCDF4的NetCDF-4方法：

import netCDF4
netcdf = netCDF4.Dataset(file)
var = netcdf.variables['sea_water_potential_temperature']
var_array = var.data.flatten()
min = var_array.data.min()
max = var_array.data.max()

值得注意的是,我必须首先在netCDF4中展平数据数组,这种操作显然会减慢速度.

有更好/更快的方式吗？

解决方法:

根据hpaulj的建议,这是一个使用子进程调用nco命令ncwa的函数.它在使用OPeNDAP地址时非常糟糕,我手边没有任何文件可以在本地测试它.

您可以看到它是否适合您以及速度差异.

这假设您已安装nco库.

def ncwa(path, fnames, var, op_type, times=None, lons=None, lats=None):
    '''Perform arithmetic operations on netCDF file or OPeNDAP data

    Args
    ----
    path: str
        prefix
    fnames: str or iterable
        Names of file(s) to perform operation on
    op_type: str
        ncwa arithmetic operation to perform. Available operations are:
        avg,mabs,mebs,mibs,min,max,ttl,sqravg,avgsqr,sqrt,rms,rmssdn
    times: tuple
        Minimum and maximum timestamps within which to perform the operation
    lons: tuple
        Minimum and maximum longitudes within which to perform the operation
    lats: tuple
        Minimum and maximum latitudes within which to perform the operation

    Returns
    -------
    result: float
        Result of the operation on the selected data

    Note
    ----
    Adapted from the OPeNDAP examples in the NCO documentation:
    http://nco.sourceforge.net/nco.html#OPeNDAP
    '''
    import os
    import netCDF4
    import numpy
    import subprocess

    output = 'tmp_output.nc'

    # Concatenate subprocess command
    cmd = ['ncwa']
    cmd.extend(['-y', '{}'.format(op_type)])
    if times:
        cmd.extend(['-d', 'time,{},{}'.format(times[0], times[1])])
    if lons:
        cmd.extend(['-d', 'lon,{},{}'.format(lons[0], lons[1])])
    if lats:
        cmd.extend(['-d', 'lat,{},{}'.format(lats[0], lats[1])])
    cmd.extend(['-p', path])
    cmd.extend(numpy.atleast_1d(fnames).tolist())
    cmd.append(output)

    # Run cmd and check for errors
    subprocess.run(cmd, stdout=subprocess.PIPE, check=True)

    # Load, read, close data and delete temp .nc file
    data = netCDF4.Dataset(output)
    result = float(data[var][:])
    data.close()
    os.remove(output)

    return result

path = 'https://ecowatch.ncddc.noaa.gov/thredds/dodsC/hycom/hycom_reg6_agg/'
fname = 'HYCOM_Region_6_Aggregation_best.ncd'

times = (0.0, 48.0)
lons = (201.5, 205.5)
lats = (18.5, 22.5)

smax = ncwa(path, fname, 'salinity', 'max', times, lons, lats)

标签：python,numpy,scipy,netcdf
来源： https://codeday.me/bug/20190708/1406147.html