原 Greenplum安装报错could not create semaphores No space left on device
Tags: 原创故障处理GreenPlumNo space left on devicesemaphores
现象
此处配置每个segment节点上包括4个primary节点和4个mirror节点:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | -- master节点配置 cat > /home/gpadmin/conf/initgp_config <<"EOF" declare -a DATA_DIRECTORY=(/opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary) declare -a MIRROR_DATA_DIRECTORY=(/opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror) ARRAY_NAME="lhrgp" SEG_PREFIX=gpseg PORT_BASE=6000 MIRROR_PORT_BASE=7000 MASTER_PORT=5432 MASTER_HOSTNAME=mdw1 MASTER_DIRECTORY=/opt/greenplum/data/master DATABASE_NAME=lhrgpdb MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts EOF su - gpadmin gpinitsystem -c /home/gpadmin/conf/initgp_config -e=lhr -s mdw2 -P 5432 -S /opt/greenplum/data/master/gpseg-1 -m 500 -b 256MB |
结果报错了:
1 2 3 4 5 6 7 8 9 10 11 12 13 | 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:------------------------------------------------ 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:-Parallel process exit status 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:------------------------------------------------ 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:-Total processes marked as completed = 2 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:-Total processes marked as killed = 0 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[WARN]:-Total processes marked as failed = 6 <<<<< 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:------------------------------------------------ 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[FATAL]:-Errors generated from parallel processes 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[INFO]:-Dumped contents of status file to the log file 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[FATAL]:-Failures detected, see log file /home/gpadmin/gpAdminLogs/gpinitsystem_20230213.log for more detail Script Exiting! 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[WARN]:-Script has left Greenplum Database in an incomplete state 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[WARN]:-Run command bash /home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20230213_161405 on master to remove these changes [gpadmin@mdw1 ~]$ sz /home/gpadmin/gpAdminLogs/gpinitsystem_20230213.log |
查看日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | [gpadmin@mdw1 ~]$ more /home/gpadmin/gpAdminLogs/gpinitsystem_20230213.log | grep FATAL 2023-02-13 16:14:47.382268 CST,,,p22449,th-1426556800,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device","Failed system call was semget(5432129, 17, 03600).","This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.",,,,,,"InternalIpcSemaphoreCreate","pg_sema.c",126,2023-02-13 16:14:47.236263 CST,,,p23024,th-853837696,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device","Failed system call was semget(5432129, 17, 03600).","This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. 2023-02-13 08:14:46.989440 GMT,,,p22426,th144357504,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device","Failed system call was semget(5432129, 17, 03600).","This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. 2023-02-13 16:14:47.432215 CST,,,p22450,th-426620800,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device","Failed system call was semget(5432129, 17, 03600).","This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. 2023-02-13 08:14:46.859990 GMT,,,p22988,th148498560,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device","Failed system call was semget(5432129, 17, 03600).","This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. 2023-02-13 16:14:47.323877 CST,,,p23047,th15743104,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device","Failed system call was semget(5432129, 17, 03600).","This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. 20230213:16:14:46:026467 gpcreateseg.sh:mdw1:gpadmin-[FATAL][1]:-Failed to start segment instance database sdw1 /opt/greenplum/data/primary/gpseg1 20230213:16:14:47:026760 gpcreateseg.sh:mdw1:gpadmin-[FATAL][7]:-Failed to start segment instance database sdw2 /opt/greenplum/data/primary/gpseg7 20230213:16:14:47:026483 gpcreateseg.sh:mdw1:gpadmin-[FATAL][2]:-Failed to start segment instance database sdw1 /opt/greenplum/data/primary/gpseg2 initializing pg_authid ... 20230213:16:14:47:026597 gpcreateseg.sh:mdw1:gpadmin-[FATAL][5]:-Failed to start segment instance database sdw2 /opt/greenplum/data/primary/gpseg5 20230213:16:14:47:026509 gpcreateseg.sh:mdw1:gpadmin-[FATAL][3]:-Failed to start segment instance database sdw1 /opt/greenplum/data/primary/gpseg3 initializing dependencies ... 20230213:16:14:47:026677 gpcreateseg.sh:mdw1:gpadmin-[FATAL][6]:-Failed to start segment instance database sdw2 /opt/greenplum/data/primary/gpseg6 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[FATAL]:-Errors generated from parallel processes 20230213:16:15:01:015508 gpinitsystem:mdw1:gpadmin-[FATAL]:-Failures detected, see log file /home/gpadmin/gpAdminLogs/gpinitsystem_20230213.log for more detail Script Exiting! [gpadmin@mdw1 ~]$ more /home/gpadmin/gpAdminLogs/gpinitsystem_20230213.log | grep FAIAL [gpadmin@mdw1 ~]$ more /home/gpadmin/gpAdminLogs/gpinitsystem_20230213.log | grep FAILED FAILED:sdw1~sdw1~6001~/opt/greenplum/data/primary/gpseg1~3~1 FAILED:sdw2~sdw2~6003~/opt/greenplum/data/primary/gpseg7~9~7 FAILED:sdw1~sdw1~6002~/opt/greenplum/data/primary/gpseg2~4~2 FAILED:sdw2~sdw2~6001~/opt/greenplum/data/primary/gpseg5~7~5 FAILED:sdw1~sdw1~6003~/opt/greenplum/data/primary/gpseg3~5~3 FAILED:sdw2~sdw2~6002~/opt/greenplum/data/primary/gpseg6~8~6 |
分析
主要是内核参数kernel.sem配置过小导致。
在postgres中,当max_connect设置过大,启动的时候会报错:
FATAL: could not create semaphores: No space left on device
创建semaphores时空间参数不够,查询官网,有这么一段解释:
HINT: This error does not mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter.
The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.
解决的方法是改小max_connect,当业务不允许的情况下,修改内核参数,max_connect相关的内核参数有:
下面介绍如何修改内核参数