原 GreenPlum在何时会发生自动故障切换、节点漂移及如何排查(OOM)
Tags: 原创故障处理GreenPlumOOMsegment实例漂移自动故障转移节点切换节点漂移
简介
若发生了OOM,则在gpcc的告警通知里会有“[告警]Out of memory errors”,例如:
情况1(大部分情况):发生了OOM
发生自动切换的一个示例是发生了OOM,在master的日志文件中会有如下的内容:
“FTS: cannot establish libpq connection (content=0, dbid=11): could not fork new process for connection: Cannot allocate memory”或“FATAL: Out of memory. Failed on request of size 144 bytes. (context 'GPORCAmemory pool') ”或“ATAL: the database system is in recovery mode”,
若没有swap内存配置,会发生OOM,特别严重时会导致segment自动故障切换。
1 2 3 4 5 6 7 8 | FTS: cannot establish libpq connection (content=0, dbid=11): could not fork new process for connection: Cannot allocate memory The previous session was reset because its gang was disconnected (session id = 6072). The new session id = 109485 FATAL: Out of memory. Failed on request of size 144 bytes. (context 'GPORCAmemory pool') FATAL: the database system is in recovery mode gang was lost due to cluster reconfiguration(cdbgang_async.c:97) rejecting TCP connection to master using internalconnection protocol Any temporary tables for this session have been dropped because the gang was disconnected (session id = 85341) failed to acquire resources on one or more segments |
模拟OOM错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | docker rm -f gpdbtest docker run -itd --name gpdbtest -h gpdb6 \ -m 2GB --memory-swap=2GB \ -p 28180:28080 \ --privileged=true lhrbest/greenplum:6.23.1 /usr/sbin/init docker exec -it gpdbtest bash su - gpadmin gpstart -am gpconfig -c gp_vmem_protect_limit -v 1024 gpstop -M fast -ar gpcc start http://192.168.88.162:28180 docker stats gpdbtest psql -c 'drop database sbtest2;' psql -c 'create database sbtest2;' sysbench /usr/share/sysbench/oltp_common.lua --db-driver=pgsql --pgsql-host=127.0.0.1 --pgsql-port=5432 \ --pgsql-user=gpadmin --pgsql-password=lhr --pgsql-db=sbtest2 \ --time=300 --table-size=2000000 --tables=10 --threads=10 \ --events=999999999 prepare select * from gp_segment_configuration ; |
情况2:最大进程数超限导致系统资源不足
最大进程数超限,此时,日志报错:
1 2 3 4 5 6 7 8 9 | could not fork new process for connection: Resource temporarily unavailable could not fork new process for connection: Resource temporarily unavailable (seg0 119.10.25.26:6000) FATAL: InitMotionLayerIPC: failed to create thread (ic_udpifc.c:1488) DETAIL: pthread_create() failed with err 11 (seg11 19.10.25.26:7003) |
该报错,多半是因为内核参数没有做正确修改,修复如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ll /lib64/security/pam_limits.so echo "session required /lib64/security/pam_limits.so" >> /etc/pam.d/login cat >> /etc/security/limits.conf <<"EOF" * soft nofile 655350 * hard nofile 655350 * soft nproc 655350 * hard nproc 655350 gpadmin soft priority -20 EOF sed -i 's/4096/655350/' /etc/security/limits.d/20-nproc.conf cat /etc/security/limits.d/20-nproc.conf cat >> /etc/sysctl.conf <<"EOF" fs.file-max=9000000 fs.inotify.max_user_instances = 1000000 fs.inotify.max_user_watches = 1000000 kernel.pid_max=4194304 EOF sysctl -p |
重启主机生效。
模拟“DETAIL: pthread_create() failed with err 11”错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | docker rm -f gpdbtest docker run -itd --name gpdbtest -h gpdb6 \ -p 28180:28080 \ --privileged=true lhrbest/greenplum:6.23.1 /usr/sbin/init docker exec -it gpdbtest bash cat >> /etc/security/limits.conf <<"EOF" * soft nofile 800 * hard nofile 800 * soft nproc 800 * hard nproc 800 EOF sed -i 's/1024/800/' /etc/security/limits.d/20-nproc.conf cat /etc/security/limits.d/20-nproc.conf gpstart -m gpconfig -c max_connections -v 2000 -m 1000 gpstop -M fast -ar psql -c 'drop database sbtest2;' psql -c 'create database sbtest2;' sysbench /usr/share/sysbench/oltp_common.lua --db-driver=pgsql --pgsql-host=127.0.0.1 --pgsql-port=5432 \ --pgsql-user=gpadmin --pgsql-password=lhr --pgsql-db=sbtest2 \ --time=300 --table-size=10 --tables=200 --threads=200 \ --events=999999999 prepare |
情况3:kill掉实例
当手工kill掉某个PG实例后,也会自动发生切换。
情况4:master的主机名不能解析
所有的segment节点不能解析master的主机,效果类似于segment宕机,所以,这个时候会发生切换。
情况5:负载过大或CPU压力或主机压力较大或发生大量的swap置换
当某个节点的负载过大或CPU压力或主机压力较大时也会发生故障启动切换。
之前碰到过1条SQL语句,每分钟插入8万条数据导致主备切换(按秒统计时,每秒1万条)。
情况6:某个segment的主机网络不通
某个主机网络临时不通的话,或出现丢包严重,忘了抖动等,也会发生切换。 和情况4差不多。
此时,报错信息一般为:“Connection reset by peer” ,或“error code = 110 (Connection timed out)”,出现丢包现象,若频繁出现,则可以使用长ping来观察网络情况。
参考:https://dbaup.com/shiyong-mtr-fenxiwangluoyanchijidiubaowenti.html#zong_jie
排查SQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | -- 找到发生切换的时间点 select * from gp_configuration_history ORDER BY time desc limit 100; -- 查询时间视图 select 'gp_log_master_ext' tb,min(logtime),max(logtime) from gpmetrics.gp_log_master_ext union all select 'gpcc_pg_log_history' tb,min(logtime),max(logtime) from gpmetrics.gpcc_pg_log_history; -- 查询切换时的详细日志(有时候查不出来,需要去查询gp_log_master_ext或直接查询日志文件) SELECT d.logtime, d.loguser, d.logdatabase, d.loghost, d.logsessiontime, d.logseverity, d.logmessage, d.logdebug, d.logdetail, logcontext from gpmetrics.gpcc_pg_log_history d where d.logseverity not in ( 'LOG' ) and d.logstate not in ( '58P01' ) -- and d.logtime >= now()-interval '7 day' and d.logtime >= '2024-04-10 09:30' and d.logtime <= '2024-04-10 09:40' and ( d.logmessage like 'Out of memory%' or d.logmessage like 'failed to acquire resources on one or more segments%' or d.logmessage like 'FTS detected connection lost during dispatch to%' or d.logmessage like 'ERROR: FTS double fault detected%' or d.logmessage like 'Canceling query because of high VMEM usage%' or d.logmessage like '%server closed the connection unexpectedly%' or d.logmessage like '%broadcast failed to one or more segments%' or d.logmessage like '%gang was lost due to cluster reconfiguration%' or d.logmessage like '%because the gang was disconnected%' or d.logmessage like '%interconnect error: connection closed prematurely%' or d.logdetail like '%Cannot allocate memory%' or d.logdetail like '%FATAL: out of memory%' or d.logdetail like '%System memory limit reached, failed to allocate %' or d.logdetail like 'FTS detected one or more segments are down' or d.logdetail like '%Vmem limit reached, failed to allocate%' or d.logdetail like 'FATAL: InitMotionLayerIPC: failed to create thread%' or d.logdetail like 'could not fork new process for connection%' or d.logdetail like '%Resource temporarily unavailable%' or d.logdetail like '%Connection reset by peer%' or d.logdetail like '%Connection refused%' ) order by d.logtime limit 500; SELECT d.logtime, d.loguser, d.logdatabase, d.loghost, d.logsessiontime, d.logseverity, d.logmessage, d.logdebug, d.logdetail, logcontext FROM gpmetrics.gp_log_master_ext d where d.logseverity not in ('LOG') and d.logstate not in ('58P01') -- and d.logtime >= now()-interval '7 day' and d.logtime >= '2024-04-15 15:30' and d.logtime <= '2024-04-15 15:40' and ( d.logmessage like 'Out of memory%' or d.logmessage like 'failed to acquire resources on one or more segments%' or d.logmessage like 'FTS detected connection lost during dispatch to%' or d.logmessage like 'ERROR: FTS double fault detected%' or d.logmessage like 'Canceling query because of high VMEM usage%' or d.logmessage like '%server closed the connection unexpectedly%' or d.logmessage like '%broadcast failed to one or more segments%' or d.logmessage like '%gang was lost due to cluster reconfiguration%' or d.logmessage like '%because the gang was disconnected%' or d.logmessage like '%interconnect error: connection closed prematurely%' or d.logdetail like '%Cannot allocate memory%' or d.logdetail like '%FATAL: out of memory%' or d.logdetail like '%System memory limit reached, failed to allocate %' or d.logdetail like 'FTS detected one or more segments are down' or d.logdetail like '%Vmem limit reached, failed to allocate%' or d.logdetail like 'FATAL: InitMotionLayerIPC: failed to create thread%' or d.logdetail like 'could not fork new process for connection%' or d.logdetail like '%Resource temporarily unavailable%' or d.logdetail like '%Connection reset by peer%' or d.logdetail like '%Connection refused%' ) order by d.logtime ; -- 查看日志 ps -ef|grep post| grep bin cd /opt/greenplum/data/master/gpseg-1/pg_log -- 每隔15秒收集1次所有主机的系统信息 SELECT ctime::date || ' ' || CASE WHEN extract(hour from ctime) < 10 THEN '0' ELSE '' END ||extract(hour from ctime) ||':'|| CASE WHEN extract(MINUTE from ctime) < 10 THEN '0' ELSE '' END || extract(MINUTE from ctime) AS time1 , hostname , ROUND(AVG(cpu_sys::numeric),2) AS cpu_sys , ROUND(avg(cpu_user::numeric),2) AS cpu_user , ROUND(avg((cpu_sys+cpu_user)::numeric),2) AS cpu_use , ROUND(avg((cpu_iowait)::numeric),2) AS cpu_iowait , ROUND(avg(gsh.cpu_idle::numeric),2) AS cpu_idle , ROUND(avg(mem_total::numeric/1024/1024/1024)) AS mem_total_GB , ROUND(avg(mem_used::numeric/1024/1024/1024),2) AS mem_used_GB -- , ROUND(avg(mem_actual_used::numeric/1024/1024/1024),2) AS mem_actual_GB , ROUND(avg((mem_buffers+mem_cached)::numeric/1024/1024/1024),2) AS mem_buffer_cache_GB -- , ROUND(avg(((gsh.mem_total-mem_used)::numeric)/1024/1024/1024),2) AS mem_available_GB , ROUND(avg(gsh.swap_total::numeric/1024/1024/1024)) as swap_total , ROUND(avg(gsh.swap_used::numeric/1024/1024/1024),2) as swap_used , ROUND(avg(gsh.swap_page_in::numeric),2) as swap_page_in , ROUND(avg(gsh.swap_page_out::numeric),2) as swap_page_out , ROUND(avg(load0::numeric),2) AS load0 , ROUND(avg(load1::numeric),2) AS load1 , ROUND(avg(load2::numeric),2) AS load2 , ROUND((avg(disk_rb_rate::numeric) / 1024 / 1024),2) AS disk_R_MBs , ROUND((avg(disk_wb_rate::numeric) / 1024 / 1024),2) AS disk_W_MBs , ROUND((avg(net_rb_rate::numeric) / 1024 / 1024),2) AS net_I_MBs , ROUND((avg(net_wb_rate::numeric) / 1024 / 1024),2) AS net_O_MBs FROM gpmetrics.gpcc_system_history gsh WHERE hostname LIKE 'mdw%' AND ctime >= '2024-09-14 16:30:00' AND ctime < '2024-09-14 19:00:00' GROUP BY 1,2 ORDER BY 1 ; -- 系统内存和swap信息 SELECT DISTINCT gsh.hostname, ROUND(gsh.mem_total::numeric/1024/1024/1024) AS mem_total_G, ROUND(gsh.swap_total::numeric/1024/1024/1024) AS swap_total_G FROM gpmetrics.gpcc_system_history gsh where ctime >= NOW() - INTERVAL '1 minute' and gsh.mem_total > 0 order by 1 limit 10; -- 出问题时间段的SQL语句 SELECT substr(gqh.ctime::VARCHAR,0,17) ctime, substr(gqh.query_text,0,50000) query_text, count(*) cnt, TRUNC(sum(gqh.disk_read_bytes)/1024/1024/1024,2) disk_read_GB, TRUNC(sum(disk_write_bytes)/1024/1024/1024,2) disk_write_GB from gpmetrics.gpcc_queries_history gqh WHERE ctime >= '2024-10-27 20:30' AND ctime <= '2024-10-27 22:30' and lower(query_text) like '%result_lis_xh_ls%' group by substr(gqh.ctime::VARCHAR,0,17),substr(gqh.query_text,0,50000) ORDER BY 4 desc , 5 desc ; |