原 Greenplum使用gpstate显示WARNING警告Total number of postmaster.pid files missing
Tags: 原创GreenPlumgpstateWARNING警告
现象
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | [gpadmin@mdw1 ~]$ gpstate 20230128:12:55:44:145555 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 20230128:12:55:44:145555 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.13.0 build commit:4f1adf8e247a9685c19ea02bcaddfdc200937ecd Open Source' 20230128:12:55:44:145555 gpstate:mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.13.0 build commit:4f1adf8e247a9685c19ea02bcaddfdc200937ecd Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 18 2020 22:31:16' 20230128:12:55:44:145555 gpstate:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master... 20230128:12:55:44:145555 gpstate:mdw1:gpadmin-[INFO]:-Gathering data from segments.... 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:-Greenplum instance status summary 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Master instance = Active 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Master standby = No master standby configured 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total segment instance count from metadata = 32 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Primary Segment Status 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total primary segments = 16 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment valid (at master) = 16 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment failures (at master) = 0 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number of postmaster.pid files missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number of postmaster.pid PIDs missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number of /tmp lock files missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number postmaster processes missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Mirror Segment Status 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segments = 16 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment valid (at master) = 16 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number of postmaster.pid files missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number of postmaster.pid PIDs missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number of /tmp lock files missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[WARNING]:-Total number postmaster processes missing = 4 <<<<<<<< 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 12 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 16 20230128:12:55:45:145555 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- [gpadmin@mdw1 ~]$ test_db=# select * from gp_segment_configuration order by 2,1; dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir ------+---------+------+----------------+------+--------+------+----------+---------+----------------------------------- 34 | -1 | p | p | s | u | 5432 | mdw1 | mdw1 | /data/gpadmindata/gpmaster/gpseg-1 2 | 0 | p | p | s | u | 6000 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap1/gpseg0 18 | 0 | m | m | s | u | 7000 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam1/gpseg0 3 | 1 | p | p | s | u | 6001 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap2/gpseg1 19 | 1 | m | m | s | u | 7001 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam2/gpseg1 4 | 2 | p | p | s | u | 6002 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap3/gpseg2 20 | 2 | m | m | s | u | 7002 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam3/gpseg2 5 | 3 | p | p | s | u | 6003 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap4/gpseg3 21 | 3 | m | m | s | u | 7003 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam4/gpseg3 6 | 4 | p | p | s | u | 6000 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap1/gpseg4 22 | 4 | m | m | s | u | 7000 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam1/gpseg4 7 | 5 | p | p | s | u | 6001 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap2/gpseg5 23 | 5 | m | m | s | u | 7001 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam2/gpseg5 8 | 6 | p | p | s | u | 6002 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap3/gpseg6 24 | 6 | m | m | s | u | 7002 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam3/gpseg6 9 | 7 | p | p | s | u | 6003 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap4/gpseg7 25 | 7 | m | m | s | u | 7003 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam4/gpseg7 10 | 8 | p | p | s | u | 6000 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap1/gpseg8 26 | 8 | m | m | s | u | 7000 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam1/gpseg8 11 | 9 | p | p | s | u | 6001 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap2/gpseg9 27 | 9 | m | m | s | u | 7001 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam2/gpseg9 12 | 10 | p | p | s | u | 6002 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap3/gpseg10 28 | 10 | m | m | s | u | 7002 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam3/gpseg10 13 | 11 | p | p | s | u | 6003 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap4/gpseg11 29 | 11 | m | m | s | u | 7003 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam4/gpseg11 14 | 12 | p | p | s | u | 6000 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap1/gpseg12 30 | 12 | m | m | s | u | 7000 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam1/gpseg12 15 | 13 | p | p | s | u | 6001 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap2/gpseg13 31 | 13 | m | m | s | u | 7001 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam2/gpseg13 16 | 14 | p | p | s | u | 6002 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap3/gpseg14 32 | 14 | m | m | s | u | 7002 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam3/gpseg14 17 | 15 | p | p | s | u | 6003 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap4/gpseg15 33 | 15 | m | m | s | u | 7003 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam4/gpseg15 (33 rows) test_db=# select * from gp_segment_configuration order by hostname,port,role; dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir ------+---------+------+----------------+------+--------+------+----------+---------+----------------------------------- 34 | -1 | p | p | s | u | 5432 | mdw1 | mdw1 | /data/gpadmindata/gpmaster/gpseg-1 2 | 0 | p | p | s | u | 6000 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap1/gpseg0 3 | 1 | p | p | s | u | 6001 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap2/gpseg1 4 | 2 | p | p | s | u | 6002 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap3/gpseg2 5 | 3 | p | p | s | u | 6003 | sdw1 | sdw1 | /data1/gpadmindata/gpdatap4/gpseg3 30 | 12 | m | m | s | u | 7000 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam1/gpseg12 31 | 13 | m | m | s | u | 7001 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam2/gpseg13 32 | 14 | m | m | s | u | 7002 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam3/gpseg14 33 | 15 | m | m | s | u | 7003 | sdw1 | sdw1 | /data2/gpadmindata/gpdatam4/gpseg15 6 | 4 | p | p | s | u | 6000 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap1/gpseg4 7 | 5 | p | p | s | u | 6001 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap2/gpseg5 8 | 6 | p | p | s | u | 6002 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap3/gpseg6 9 | 7 | p | p | s | u | 6003 | sdw2 | sdw2 | /data1/gpadmindata/gpdatap4/gpseg7 18 | 0 | m | m | s | u | 7000 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam1/gpseg0 19 | 1 | m | m | s | u | 7001 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam2/gpseg1 20 | 2 | m | m | s | u | 7002 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam3/gpseg2 21 | 3 | m | m | s | u | 7003 | sdw2 | sdw2 | /data2/gpadmindata/gpdatam4/gpseg3 10 | 8 | p | p | s | u | 6000 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap1/gpseg8 11 | 9 | p | p | s | u | 6001 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap2/gpseg9 12 | 10 | p | p | s | u | 6002 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap3/gpseg10 13 | 11 | p | p | s | u | 6003 | sdw3 | sdw3 | /data1/gpadmindata/gpdatap4/gpseg11 22 | 4 | m | m | s | u | 7000 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam1/gpseg4 23 | 5 | m | m | s | u | 7001 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam2/gpseg5 24 | 6 | m | m | s | u | 7002 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam3/gpseg6 25 | 7 | m | m | s | u | 7003 | sdw3 | sdw3 | /data2/gpadmindata/gpdatam4/gpseg7 14 | 12 | p | p | s | u | 6000 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap1/gpseg12 15 | 13 | p | p | s | u | 6001 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap2/gpseg13 16 | 14 | p | p | s | u | 6002 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap3/gpseg14 17 | 15 | p | p | s | u | 6003 | sdw4 | sdw4 | /data1/gpadmindata/gpdatap4/gpseg15 26 | 8 | m | m | s | u | 7000 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam1/gpseg8 27 | 9 | m | m | s | u | 7001 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam2/gpseg9 28 | 10 | m | m | s | u | 7002 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam3/gpseg10 29 | 11 | m | m | s | u | 7003 | sdw4 | sdw4 | /data2/gpadmindata/gpdatam4/gpseg11 (33 rows) |
分析
GreenPlum环境分析:一个master主机,4个segment主机,每个segment主机上有4个Primary Segment Instance 和 4个Mirror Segment Instance共8个实例。所以,该集群共有16个Primary实例,和16个Mirror实例。由gpstate结果可以看到,postmaster.pid
和lock
文件都缺失4个。
分别进入4个segment,查找如下的文件,看是否都存在:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | [root@sdw1 ~]# find /data2 -name postmaster.pid /data2/gpadmindata/gpdatam3/gpseg14/postmaster.pid /data2/gpadmindata/gpdatam1/gpseg12/postmaster.pid /data2/gpadmindata/gpdatam4/gpseg15/postmaster.pid /data2/gpadmindata/gpdatam2/gpseg13/postmaster.pid [root@sdw1 ~]# find /data1 -name postmaster.pid /data1/gpadmindata/gpdatap4/gpseg3/postmaster.pid /data1/gpadmindata/gpdatap3/gpseg2/postmaster.pid /data1/gpadmindata/gpdatap2/gpseg1/postmaster.pid /data1/gpadmindata/gpdatap1/gpseg0/postmaster.pid [root@sdw1 ~]# ll /tmp/ -a | grep lock -rw------- 1 gpadmin gpadmin 61 Jan 28 12:39 .s.PGSQL.6000.lock -rw------- 1 gpadmin gpadmin 61 Jan 28 12:39 .s.PGSQL.6001.lock -rw------- 1 gpadmin gpadmin 61 Jan 28 12:39 .s.PGSQL.6002.lock -rw------- 1 gpadmin gpadmin 61 Jan 28 12:39 .s.PGSQL.6003.lock -rw------- 1 gpadmin gpadmin 62 Jan 28 12:47 .s.PGSQL.7000.lock -rw------- 1 gpadmin gpadmin 62 Jan 28 12:47 .s.PGSQL.7001.lock -rw------- 1 gpadmin gpadmin 62 Jan 28 12:47 .s.PGSQL.7002.lock -rw------- 1 gpadmin gpadmin 62 Jan 28 12:47 .s.PGSQL.7003.lock |
若都存在,则该警告可以忽略。
结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | [gpadmin@mdw1 ~]$ gpstate 20230128:13:20:12:183441 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 20230128:13:20:12:183441 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.13.0 build commit:4f1adf8e247a9685c19ea02bcaddfdc200937ecd Open Source' 20230128:13:20:12:183441 gpstate:mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.13.0 build commit:4f1adf8e247a9685c19ea02bcaddfdc200937ecd Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 18 2020 22:31:16' 20230128:13:20:12:183441 gpstate:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master... 20230128:13:20:12:183441 gpstate:mdw1:gpadmin-[INFO]:-Gathering data from segments... ... 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:-Greenplum instance status summary 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Master instance = Active 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Master standby = No master standby configured 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total segment instance count from metadata = 32 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Primary Segment Status 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total primary segments = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment valid (at master) = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total primary segment failures (at master) = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Mirror Segment Status 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segments = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment valid (at master) = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid files found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number of /tmp lock files found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number postmaster processes found = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 16 20230128:13:20:16:183441 gpstate:mdw1:gpadmin-[INFO]:----------------------------------------------------- [gpadmin@mdw1 ~]$ |