原 GreenPlum严重故障导致不能启库的解决办法(pg_resetxlog重置日志)
现象
GP的某个实例报错如下,导致整个GP不能启动:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | 2024-01-28 08:25:30.644871 CST,,,p11917,th-1607219072,,,,0,,,seg7,,,,,"PANIC","58030","could not fsync file ""base/2291985/2579713"" (is_ao: 0): Input/output error",,,,,,,0,,"md.c",1404,"Stack trace: 1 0xbfdc67 postgres errstart (elog.c:557) 2 0xa8d711 postgres mdsync (md.c:1401) 3 0xa5c848 postgres CheckPointBuffers (bufmgr.c:2008) 4 0x74a756 postgres CreateCheckPoint (xlog.c:9559) 5 0x74e194 postgres StartupXLOG (xlog.c:7917) 6 0xa1c623 postgres StartupProcessMain (startup.c:248) 7 0x78a66f postgres AuxiliaryProcessMain (bootstrap.c:453) 8 0xa185dc postgres <symbol not found> (postmaster.c:5774) 9 0xa1be22 postgres PostmasterMain (postmaster.c:1510) 10 0x6b54a1 postgres main (main.c:205) 11 0x7f189cf273d5 libc.so.6 __libc_start_main + 0xf5 12 0x6c134c postgres <symbol not found> + 0x6c134c " 2024-01-28 08:25:19.809509 CST,"gpadmin","postgres",p15579,th-1607219072,"[local]",,2024-01-28 08:25:19 CST,0,,,seg7,,,,,"FATAL","57P03","the database system is starting up","last replayed record at DF/37260048",,,,,,0,,"postmaster.c",2571, 2024-01-28 08:25:52.894238 CST,,,p11903,th-1607219072,,,,0,,,seg7,,,,,"LOG","00000","startup process (PID 11917) was terminated by signal 6: Aborted",,,,,,,0,,"postmaster.c",4021, 2024-01-28 08:25:52.894343 CST,,,p11903,th-1607219072,,,,0,,,seg7,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",3261, |
分析
通过 select oid,datname from pg_database;
查询2291985可知,该库是一个使用gpbackup正在导入的库,但是因为某些原因,导致该实例挂掉了。。。
通过多种手段也不能解决。。。
解决1:重置事务日志(慎重,可能会丢失数据)
最后可以考虑清空事务日志的方法来解决。
由于一些错误导致不能启动GP数据库,则可以考虑直接删除如下内容:
1 | rm -rf /opt/greenplum/data/primary/gpseg1/pg_xlog/* |
然后启动库,会报错could not locate a valid checkpoint record,此时做如下操作:
1 | pg_resetxlog -f /opt/greenplum/data/primary/gpseg1/ |
最后再重启库。
示例:
1 2 3 4 | [gpadmin@gpdb6261 ~]$ rm -rf /opt/greenplum/data/primary/gpseg1/pg_xlog/* [gpadmin@gpdb6261 ~]$ rm -rf /opt/greenplum/data/primary/gpseg0/pg_xlog/* [gpadmin@gpdb6261 ~]$ mkdir -p /opt/greenplum/data/primary/gpseg0/pg_xlog/archive_status [gpadmin@gpdb6261 ~]$ mkdir -p /opt/greenplum/data/primary/gpseg1/pg_xlog/archive_status |
重置事务日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [gpadmin@gpdb6261 ~]$ pg_resetxlog -f /opt/greenplum/data/primary/gpseg1/ WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss and render system irrecoverable. Do you wish to proceed? [yes/no] yes pg_resetxlog: could not open directory "pg_xlog/archive_status": No such file or directory [gpadmin@gpdb6261 ~]$ mkdir -p /opt/greenplum/data/primary/gpseg1/pg_xlog/archive_status [gpadmin@gpdb6261 ~]$ mkdir -p /opt/greenplum/data/primary/gpseg0/pg_xlog/archive_status [gpadmin@gpdb6261 ~]$ pg_resetxlog -f /opt/greenplum/data/primary/gpseg1/ WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss and render system irrecoverable. Do you wish to proceed? [yes/no] yes Transaction log reset [gpadmin@gpdb6261 ~]$ pg_resetxlog -f /opt/greenplum/data/primary/gpseg0/ WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss and render system irrecoverable. Do you wish to proceed? [yes/no] ys Abort pg_resetxlog! [gpadmin@gpdb6261 ~]$ pg_resetxlog -f /opt/greenplum/data/primary/gpseg0/ WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss and render system irrecoverable. Do you wish to proceed? [yes/no] yes Transaction log reset |
解决2:在master only模式下删库
由于GP不能启动,所以可以在master only模式下启库,删除后,重建数据库,发现报错:
1 2 3 4 5 6 7 | [gpadmin@mdw ~]$ createdb "DB2" createdb: database creation failed: ERROR: database "DB2" already exists (seg7 10.81.31.79:7001 pid=21609) [gpadmin@mdw ~]$ [gpadmin@mdw ~]$ dropdb "DB2" dropdb: database removal failed: ERROR: database "DB2" does not exist [gpadmin@mdw ~]$ |