原 GPCC参数metrics_collector配置错误导致GreenPlum启动报错
Tags: 原创故障处理GreenPlumgpccmetrics_collectorshared_preload_libraries
现象
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | [gpadmin@mdw1 ~]$ gpstart -a 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Starting gpstart with args: -a 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Gathering information and validating the environment... 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source' 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232' 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Starting Master instance in admin mode 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode 20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/gpdb/master/gpseg-1/ -l /data/gpdb/master/gpseg-1//pg_log/startup.log -w -t 600 -o " -p 5432 -c gp_role=utility " start' rc=1, stdout='waiting for server to start.... stopped waiting ', stderr='pg_ctl: could not start server Examine the log output. ' [gpadmin@mdw1 ~]$ tailf /data/gpdb/master/gpseg-1//pg_log/startup.log 2023-01-16 12:58:59.464993 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"LOG","00000","registering background worker ""sweeper process""",,,,,,,,"RegisterBackgroundWorker","bgworker.c",774, 2023-01-16 12:58:59.465304 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"FATAL","58P01","could not access file ""metrics_collector"": No such file or directory",,,,,,,,"internal_load_library","dfmgr.c",202,1 0xbef3fc postgres errstart (elog.c:557) 2 0xbf456d postgres <symbol not found> (dfmgr.c:199) 3 0xbf4f54 postgres load_file (dfmgr.c:156) 4 0xc083a4 postgres process_shared_preload_libraries (miscinit.c:1378) 5 0xa0d6e3 postgres PostmasterMain (postmaster.c:1151) 6 0x6b0871 postgres main (main.c:205) 7 0x7f522e7ed3d5 libc.so.6 __libc_start_main + 0xf5 8 0x6bc58c postgres <symbol not found> + 0x6bc58c |
分析
从启动日志“2023-01-16 12:58:59.465304 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"FATAL","58P01","could not access file ""metrics_collector"": No such file or directory",,,,,,,,"internal_load_library","dfmgr.c",202,1 0xbef3fc postgres errstart (elog.c:557)”可以看到应该是metrics_collector的问题,这个值是参数文件postgresql.conf中的shared_preload_libraries的值,用于开启gpcc的指标监控。
报错,应该是gpcc安装有错误,然后启动数据库导致的。
若是GPCC安装成功,则会在如下位置有库文件,否则不能随便重启GreenPlum,会导致启动失败:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [root@lhrgp40 /]# find /usr/local -name metrics_collector* /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector--1.0.sql /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector.control /usr/local/greenplum-db-6.19.3/lib/postgresql/metrics_collector.so [root@lhrgp40 /]# [gpadmin@lhrgp40 ~]$ ll $GPHOME/share/postgresql/extension/gp_wlm* -rw-r--r-- 1 gpadmin gpadmin 856 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/gp_wlm--0.1.sql -rw-r--r-- 1 gpadmin gpadmin 232 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/gp_wlm.control [gpadmin@lhrgp40 ~]$ ll $GPHOME/share/postgresql/extension/metrics_collector* -rw-r--r-- 1 gpadmin gpadmin 846 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector--1.0.sql -rw-r--r-- 1 gpadmin gpadmin 233 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector.control [gpadmin@lhrgp40 ~]$ ll $GPHOME/lib/postgresql/metrics_collector.so -rwxr-xr-x 1 gpadmin gpadmin 3357064 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/lib/postgresql/metrics_collector.so [gpadmin@lhrgp40 ~]$ [gpadmin@lhrgp40 ~]$ gppkg -q --all 20230116:14:58:39:020317 gppkg:lhrgp40:gpadmin-[INFO]:-Starting gppkg with args: -q --all MetricsCollector-6.8.3_gp_6.19.3 |
解决
1、先修复master实例,将参数文件postgresql.conf中的shared_preload_libraries的值清空
2、再修改segment实例,将参数文件postgresql.conf中的shared_preload_libraries的值清空
3、尽快启动GreenPlum实例,命令gpstart -a
4、再修复mirror实例的参数文件,将参数文件postgresql.conf中的shared_preload_libraries的值清空
5、最后再单独启动mirror实例,启动方式: