原 GreenPlum中的union可能比union all要快很多
Tags: 原创GreenPlumSQL优化union all性能优化
现象
项目上反馈说,系统很卡慢,我登陆gpcc上查看,发现有很多SQL跑了2天还没跑完,其SQL语句基本都是同一条:
SQL语句格式,是一条由UNION ALL组成的SQL,内容较复杂:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | SELECT DISTINCT a.ENCDEFBTER_ID || '_MZ' AS xh, a.HDFPITAL_SOID AS yljgdm, a.ENCDEFBTER_ID AS jzlsh, a.PERSON_ID AS patid, a.ENCDEFBTER_TYPE_NO AS jzlb, a.ENCDEFBTER_ID || '_MZ' AS yjlxh, a.ENCDEFBTER_ID || '_MZ' AS zyjlxh, b.FULL_NAME AS hzxm, COALESCE(b.OMRN, '-') AS mjzh, 'NA' AS zyh, CASE WHEN b.GENDER_NAME = '男' THEN '1' WHEN b.GENDER_NAME = '女' THEN '2' ELSE COALESCE(b.GENDER_NO, 'NA') END AS xbdm, COALESCE(b.GENDER_NAME, 'NA') AS xbmc, COALESCE(b.IDENTITY_NO, '-') AS sfzhm, COALESCE(LEFT(b.IDCARD_NO, 18), '-') AS jkkh, 'NA' AS aboxxdm, 'NA' AS aboxxmc, 'NA' AS rhxxdm, 'NA' AS rhxxmc, COALESCE(sxs.ABC_PLAIN_TEXT, 'NA') AS sxs, COALESCE(grs.ABC_PLAIN_TEXT, 'NA') AS grs, COALESCE(yjs.ABC_PLAIN_TEXT, 'NA') AS yjs, COALESCE(jbs.ABC_PLAIN_TEXT, 'NA') AS jbs, COALESCE(yfjzs.ABC_PLAIN_TEXT, 'NA') AS yfjzs, COALESCE(gms.ABC_PLAIN_TEXT, 'NA') AS gms, COALESCE(hys.ABC_PLAIN_TEXT, 'NA') AS hys, COALESCE(sss.ABC_PLAIN_TEXT, 'NA') AS sss, COALESCE(crbs.ABC_PLAIN_TEXT, 'NA') AS crbs, COALESCE(jzs.ABC_PLAIN_TEXT, 'NA') AS jzs, a.SYS_SOID AS sys_id, a.CREATED_AT AS gdsj, NOW() AS sjscsj, '0' AS pcxh FROM OUTP_ENCDEFBTER a INNER JOIN outp_record b ON a.ENCDEFBTER_ID = b.ENCDEFBTER_ID LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE ABC_NAME = '输血史' ) sxs ON sxs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND sxs.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE ABC_NAME = '个人史' ) grs ON grs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND grs.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE seq_no = '6' ) yjs ON yjs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND yjs.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE seq_no = '5' ) jbs ON jbs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND jbs.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE ABC_NAME = '预防接种史' ) yfjzs ON yfjzs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND yfjzs.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE seq_no = '4' ) gms ON gms.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND gms.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE ABC_NAME = '婚育史' ) hys ON hys.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND hys.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE ABC_NAME = '手术史' ) sss ON sss.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND sss.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE ABC_NAME = '传染病史' ) crbs ON crbs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND crbs.rk = 1 LEFT JOIN ( SELECT OUTP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY OUTP_ABC_ID DESC) AS rk FROM OUTP_ABC WHERE seq_no = '3' ) jzs ON jzs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND jzs.rk = 1 WHERE a.CREATED_AT BETWEEN '2024-08-12 11:06:49' AND '2099-12-31' UNION ALL SELECT DISTINCT a.ENCDEFBTER_ID || '_ZY' AS xh, a.HDFPITAL_SOID AS yljgdm, a.ENCDEFBTER_ID AS jzlsh, a.PERSON_ID AS patid, a.ENCDEFBTER_TYPE_NO AS jzlb, a.ENCDEFBTER_ID || '_ZY' AS yjlxh, a.ENCDEFBTER_ID || '_ZY' AS zyjlxh, b.FULL_NAME AS hzxm, 'NA' AS mjzh, COALESCE(b.IMRN, '-') AS zyh, CASE WHEN b.GENDER_NAME = '男' THEN '1' WHEN b.GENDER_NAME = '女' THEN '2' ELSE COALESCE(b.GENDER_NO, 'NA') END AS xbdm, COALESCE(b.GENDER_NAME, 'NA') AS xbmc, COALESCE(b.IDENTITY_NO, '-') AS sfzhm, COALESCE(LEFT(b.IDCARD_NO, 18), '-') AS jkkh, 'NA' AS aboxxdm, 'NA' AS aboxxmc, 'NA' AS rhxxdm, 'NA' AS rhxxmc, COALESCE(sxs.ABC_PLAIN_TEXT, 'NA') AS sxs, COALESCE(grs.ABC_PLAIN_TEXT, 'NA') AS grs, COALESCE(yjs.ABC_PLAIN_TEXT, 'NA') AS yjs, COALESCE(jbs.ABC_PLAIN_TEXT, 'NA') AS jbs, COALESCE(yfjzs.ABC_PLAIN_TEXT, 'NA') AS yfjzs, COALESCE(gms.ABC_PLAIN_TEXT, 'NA') AS gms, COALESCE(hys.ABC_PLAIN_TEXT, 'NA') AS hys, COALESCE(sss.ABC_PLAIN_TEXT, 'NA') AS sss, COALESCE(crbs.ABC_PLAIN_TEXT, 'NA') AS crbs, COALESCE(jzs.ABC_PLAIN_TEXT, 'NA') AS jzs, a.SYS_SOID AS sys_id, a.MODIFIED_AT AS gdsj, NOW() AS sjscsj, '0' AS pcxh FROM INP_ENCDEFBTER a INNER JOIN inp_record b ON a.ENCDEFBTER_ID = b.ENCDEFBTER_ID LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE ABC_NAME = '输血史' ) sxs ON sxs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND sxs.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE ABC_NAME = '个人史' ) grs ON grs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND grs.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE seq_no = '6' ) yjs ON yjs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND yjs.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE seq_no = '5' ) jbs ON jbs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND jbs.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE ABC_NAME = '预防接种史' ) yfjzs ON yfjzs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND yfjzs.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE seq_no = '4' ) gms ON gms.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND gms.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE ABC_NAME = '婚育史' ) hys ON hys.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND hys.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE ABC_NAME = '手术史' ) sss ON sss.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND sss.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE ABC_NAME = '传染病史' ) crbs ON crbs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND crbs.rk = 1 LEFT JOIN ( SELECT INP_ABC_ID, ABC_PLAIN_TEXT, ENCDEFBTER_ID, row_number() OVER (PARTITION BY ENCDEFBTER_ID ORDER BY INP_ABC_ID DESC) AS rk FROM INP_ABC WHERE seq_no = '3' ) jzs ON jzs.ENCDEFBTER_ID = a.ENCDEFBTER_ID AND jzs.rk = 1 WHERE a.MODIFIED_AT BETWEEN '2024-08-12 11:06:49' AND '2099-12-31' |
分析
初步分析,发现好多表没有索引,先创建了部分索引后继续查看执行计划,发现基本都走了索引,但是依然很慢。
优化手段
SQL调优1:将union all修改为union
这种情况碰到好几次案例了。
当然,在这里,修改为union后,结果集和之前的是一样的
分析了一下执行计划,修改为union后,执行计划里多了2次Redistribute Motion
重分布的过程。
通常情况下,UNION ALL 比 UNION 快,因为 UNION ALL 不需要执行额外的去重操作。然而,在 Greenplum 中,情况可能会相反,即 UNION 可能比 UNION ALL 快,原因如下:
- 分布式架构:
• Greenplum 是一个大规模并行处理 (MPP) 数据库,它通过将数据分布在多个段实例 (segment instances) 上来提高查询性能。
• 当使用 UNION 时,每个段实例都会独立地执行去重操作,这通常是在本地完成的,减少了网络传输的开销。
- 去重操作:
• UNION 的去重操作通常在段实例级别完成,这意味着每个段实例都会去除其局部结果集中的重复行。
• 由于去重是在段实例本地完成的,因此当合并最终结果集时,需要传输的数据量减少,从而降低了网络传输的成本。