其他分享
首页 > 其他分享> > hive表按天动态分区报错

hive表按天动态分区报错

作者:互联网

原本要将ods层的newlogs表中365天的数据全部导入到dwd层的logs表,并按天分区,但是报错了,具体情况如下

执行sql前,开启动态分区并设置参数

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=3000;
set hive.exec.max.dynamic.partitions=6000;
set mapreduce.map.memory.mb=2048;
set mapreduce.reduce.memory.mb=3072;

以下是hql语句

insert overwrite table dwd_myshops.dwd_logs partition(date)
select userid,event,time,goodid,title,price,shopid,mark,
from_unixtime(cast(time/1000 as bigint),'yyyyMMdd') date
from ods_myshops.ods_newlogs;

报错内容如下

MapReduce Total cumulative CPU time: 17 seconds 220 msec
Ended Job = job_1616718205783_0010 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1616718205783_0010_m_000000 (and more) from job job_1616718205783_0010

Task with the most failures(4):
-----
Task ID:
  task_1616718205783_0010_m_000000

URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1616718205783_0010&tipid=task_1616718205783_0010_m_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 17.22 sec   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 17 seconds 220 msec

后来修改了动态分区的参数

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=3000;
set hive.optimize.sort.dynamic.partition=true;

hive.optimize.sort.dynamic.partition=true
这个参数可以使得每个分区只产生一个文件,可以解决动态分区时的OOM问题
但会严重降低reduce处理并写入一个分区的速度

此时重新执行hql语句,按天分区成功
在这里插入图片描述

标签:set,1616718205783,exec,partition,dynamic,hive,按天,报错
来源: https://blog.csdn.net/weixin_48482704/article/details/115295477