原 视图DBA_GOLDENGATE_SUPPORT_MODE引起的集成模式OGG不同步问题排查过程
Tags: Oracle原创故障处理OGGbug11.2.0.4故障排查不同步慢
现象
使用OGG 21.3可以远程同步Oracle 11.2.0.4的数据库,这个我之前已经测试过,参考:https://www.dbaup.com/shiyongogg-21-3yuanchengshishihuxiangtongbuoracle-11-2-0-4shuangzhu.html
然后,客户这边有个需求,需要同步Oracle 11.2.0.4的rac到单机环境,我也是按照这个过程配置的,但是配置完成后,extract进程不能抽取数据,很是奇怪,也不报错,只是“Lag at Chkpt”延迟一直在增大,执行stats exta
报错“No active extraction maps.”,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | GGSCI (ogg) 1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING JAGENT STOPPED PMSRVR STOPPED EXTRACT RUNNING EXTA 16:22:31 00:00:06 EXTRACT RUNNING EXTB 16:09:25 00:00:05 REPLICAT RUNNING REPA 00:00:00 00:00:02 REPLICAT RUNNING REPB 00:00:00 00:00:04 GGSCI (ogg) 86> stats exta Sending STATS request to Extract group EXTA ... No active extraction maps. DDL replication statistics (for all trails): *** Total statistics since extract started *** Operations 0.00 Mapped operations 0.00 Unmapped operations 0.00 Other operations 0.00 Excluded operations 0.00 GGSCI (ogg) 48> info exta Extract EXTA Last Started 2022-11-03 09:15 Status RUNNING Checkpoint Lag 16:39:15 (updated 00:00:00 ago) Process ID 15884 Log Read Checkpoint Oracle Integrated Redo Logs 2022-11-02 16:37:03 SCN 7.3280852740 (33345623812) GGSCI (ogg) 49> info exta,showch Extract EXTA Last Started 2022-11-03 09:15 Status RUNNING Checkpoint Lag 16:39:15 (updated 00:00:08 ago) Process ID 15884 Log Read Checkpoint Oracle Integrated Redo Logs 2022-11-02 16:37:03 SCN 7.3280852740 (33345623812) Current Checkpoint Detail: Read Checkpoint #1 Oracle Integrated Redo Log Startup Checkpoint (starting position in the data source): Timestamp: 2022-11-02 16:36:25.000000 SCN: 0.0 (0) Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852739 (33345623811) Current Checkpoint (position of last record read in the data source): Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852740 (33345623812) Write Checkpoint #1 GGS Log Trail Current Checkpoint (current write position): Sequence #: 2 RBA: 1478 Timestamp: 2022-11-03 09:16:18.549673 Extract Trail: ./dirdat/ea Seqno Length: 9 Flip Seqno Length: No Trail Type: EXTTRAIL Header: Version = 2 Record Source = A Type = 13 # Input Checkpoints = 1 # Output Checkpoints = 1 Configuration: Data Source = 3 Transaction Integrity = 1 Task Type = 0 Status: Start Time = 2022-11-03 09:15:18 Last Update Time = 2022-11-03 09:16:18 Stop Status = A Last Result = 520 GGSCI (ogg) 51> send exta showtrans Sending SHOWTRANS request to Extract group EXTA ... Extract is currently in recovery mode (reading transactions from trail file). Please try again in a few minutes. GGSCI (ogg) 66> send extract exta status Sending STATUS request to Extract group EXTA ... EXTRACT EXTA (PID 16108) Current status: In recovery[1]: Processing data Current read position: Redo thread #: 1 Sequence #: 257 RBA: 922376 Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852740 (33345623812) Current write position: Sequence #: 3 RBA: 1478 Timestamp: 2022-11-03 09:23:39.027185 Extract Trail: ./dirdat/ea |
另外,自己新建的环境都没有问题,可以实时同步,就客户的环境不能同步!!!
分析过程
1、用以下这些命令查询分析并没找到错误的原因
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | kill exta START EXTRACT exta BRRESET START EXTRACT exta BRINTERVAL 20M info exta,showch info exta,detail send exta showtrans send extract exta status GGSCI (ogg) 3> send exta showtrans Sending SHOWTRANS request to Extract group exta ... ------------------------------------------------------------ XID: 0.5.29.7834 Items: 0 Extract: ext8 Redo Thread: 1 Start Time: 2022-11-03:10:08:09 SCN: 7.3281664446 (33346435518) Redo Seq: 1707 Redo RBA: 20690864 Status: Running |
这里其实有个很奇怪的现象,就是send exta showtrans
查询出来的是数据库未提交的事务,但是gv$transaction
视图却查不出来结果,找不到该事务XID:
1 | SELECT * FROM gv$transaction; |
2、数据库的告警日志有如下输出:
1 2 3 | setting IGNORE_UNSUPPORTED_TABLE for table (*) GoldenGate Capture:OGG$CAP_EXT1 setting _FILTER_PARTIAL_ROLLBACK: Setting XOUT_CLIENT_EXISTS to Y for Capture: OGG$CAP_EXT1 |
怀疑是否是这个问题,于是配置:
1 2 | exec DBMS_CAPTURE_ADM.SET_PARAMETER('OGG$CAP_EXTA','IGNORE_UNSUPPORTED_TABLE','-'); exec DBMS_CAPTURE_ADM.SET_PARAMETER('OGG$CAP_EXT1','IGNORE_UNSUPPORTED_TABLE','-'); |
仍然不行。参考:https://docs.oracle.com/database/121/ARPLS/d_cap_a.htm#ARPLS306
3、考虑到之前用的docker环境可以,于是对比一下2个库的差异,发现字符集和redo大小不一样。我之前用的环境是AL32UTF8字符集,而客户环境是ZHS16GBK环境,
1 2 3 4 | export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK export NLS_LANG=AMERICAN_AMERICA.AL32UTF8 setenv(NLS_LANG="AMERICAN_AMERICA.ZHS16GBK") |
仍然不行。
一般来说,字符集不一样,只会导致数据传输过来后是乱码,不会导致进程hang住。
另外,由于好的环境用的是50M的redo日志,而客户环境是300M的redo日志,于是修改redo日志为50M
1 2 3 4 5 6 | SELECT * FROM v$log; alter database add logfile size 52428800; alter database drop logfile group 13; alter system checkpoint; alter system switch logfile; |
还是不行。
4、怀疑是否和rac有关,另外,客户的环境也打了最新的PSU,为了验证这个问题,所以特意新建了2个库,redo为500m,字符集为ZHS16GBK,但是测试都没问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | dbca -silent -createDatabase -templateName General_Purpose.dbc -responseFile NO_VALUE \ -gdbname gbk -sid gbk \ -sysPassword oracle -systemPassword oracle \ -datafileDestination '/u01/app/oracle/oradata' \ -recoveryAreaDestination '/u01/app/oracle/flash_recovery_area' \ -redoLogFileSize 500 \ -storageType FS \ -characterset ZHS16GBK -nationalCharacterSet AL16UTF16 \ -sampleSchema true \ -memoryPercentage 10 \ -databaseType OLTP \ -emConfiguration NONE dbca -silent -createDatabase -templateName General_Purpose.dbc -responseFile NO_VALUE \ -gdbname gbk -sid gbk \ -sysPassword oracle -systemPassword oracle \ -datafileDestination '+DATA' -recoveryAreaDestination 'DATA/' \ -storageType ASM -asmsnmpPassword oracle -diskGroupName 'DATA' \ -characterset ZHS16GBK -nationalCharacterSet AL16UTF16 \ -redoLogFileSize 300 \ -sampleSchema true \ -memoryPercentage 10 \ -databaseType OLTP \ -emConfiguration NONE \ -nodeinfo rac1,rac2 |
那说明是客户数据库本身的问题。难道是客户的表太多导致的吗,客户端表才2万张表左右,其实不多!!!!
5、怀疑是否有特殊的触发器导致。经过查询,没有特殊的触发器!!!
6、使用OGG巡检报告
参考:使用OGG 21.3远程实时互相同步Oracle 11.2.0.4(双主) – 小麦苗DBA宝典 (dbaup.com)
https://www.dbaup.com/ogg-for-oraclejiankangxunjianguanfangjiaoben.html
这里其实有个特殊现象,就是跑脚本的时候,总是卡住,巡检结果根本出不来,于是分析脚本,找到卡住的位置如下:
1 2 3 4 5 6 | prompt prompt ++ TABLES SUPPORT BY GOLDENGATE Integrated Capture ++ prompt Lists tables that can not be supported by OGG (NONE) prompt Lists table that are supported via OGG FETCH (ID KEY) select * from DBA_GOLDENGATE_SUPPORT_MODE where support_mode in ('ID KEY', 'NONE') order by owner,object_name; |
于是拿到数据库中查询视图DBA_GOLDENGATE_SUPPORT_MODE,发现根本不能出结果。
DBA_GOLDENGATE_SUPPORT_MODE
displays information about the level of Oracle GoldenGate capture process support for the tables in the database. 捕获进程对数据库中表的支持级别的信息
Column | Datatype | NULL | Description |
---|---|---|---|
OWNER | VARCHAR2(128) | Table owner | |
OBJECT_NAME | VARCHAR2(128) | Table name | |
SUPPORT_MODE | VARCHAR2(6) | Capture process support level for the table:FULL - A capture process can capture changes made to all of the columns in the tableID KEY - A capture process can capture changes made to the key columns and any other columns in the table supported by the capture process, except for LOB , LONG , LONG RAW , and XMLType columns.INTERNAL - A capture process cannot capture changes made to any columns in the table because the table is secondary to a user-created table and is updated implicitly when changes are made to the user-created table. Such tables include mapping tables for index-organized tables, storage tables for nested tables, materialized view logs, secondary objects associated with domain indexes, and temporary tables.NONE - A capture process cannot capture changes made to any columns in the table because the table is not supported for replication. |
参考:https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/DBA_GOLDENGATE_SUPPORT_MODE.html
找到这个视图的定义,发现视图dba_goldengate_support_mode基于视图DBA_XSTREAM_OUT_SUPPORT_MODE: