合并时间段问题2

背景

hive中经常会有求连续数字或者连续的时间这种问题,其实处理的方法一致

upload successful

upload successful

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
SELECT a.name
,b.starttime
,b.endtime
,sum(case when a.date>=b.starttime and a.date<=b.endtime then a.salary else 0 end) as total_salary
FROM dw.tmp_interview_data a
LEFT JOIN
( SELECT name
,diff
,qujian[0] endtime
,case when diff=0 then qujian[0]
when diff=1 then qujian[1]
when diff=2 then qujian[2]
when diff=3 then qujian[3] end as starttime --此处如有其它时间差,需要进行枚举
FROM --取出连续时间范围
( SELECT name
,datediff(max(date),min(date)) diff
,COLLECT_set(date) qujian
FROM --求出连续时间之差
( SELECT name
,date
,salary
,date_sub(date,rank) as date2
FROM --日期与排名之间的差值
( SELECT name
,date
,salary
,row_number () over (partition by name order by date) rank
FROM dw.tmp_interview_data --按name分组后对date进行排序
) a
) a
GROUP BY name,date2
) a
) b on a.name=b.name
GROUP BY a.name,b.starttime,b.endtime

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT name
,first1
,last1
,salary * (datediff(last1, first1)+1) as salary
FROM
( SELECT name
,num
,salary
,min(date) as first1
,max(date) as last1
FROM
( SELECT a.name
,a.date
,date_sub(date, rn - 1) num
,salary
FROM
( SELECT name
,date
,salary
,row_number() over(PARTITION BY name ORDER BY date) rn
FROM dw_htlbizdb.tmp_qh_liu_interview_data
GROUP BY name,date,salary
) a
) b group by name,num,salary
) a

其实两种方法比较下来,都先先用row_number()函数通过name分组,并对日期进行排序,然后利用日期与所得排名之间的差值来做进一步的处理。连续数字问题思路同上~,该方法是有问题的,计算薪水是按照连续工资的第一天乘以天数的,如果每天工资不一样呢,但是该题目的主要参看价值是合并时间段。
参考:
https://blog.csdn.net/weixin_37536446/article/details/82143417