原 Oracle的DG环境中由于配置了登陆触发器导致日志不能同步报错ORA-16191的实验
Tags: Oracle原创DGORA-16191触发器ADG实验验证日志不同步
前言
之前碰到DG环境中,redo不能同步的大部分原因是密码文件的问题,解决办法也很简单,就是生成一份密码文件,然后拷贝到其它节点即可。
最近碰到一种情况是由于配置了登陆触发器导致redo不能同步。
本文就这种情况,我们来做实验验证。
快速部署DG环境
使用docker快速部署:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | nohup docker pull registry.cn-hangzhou.aliyuncs.com/lhrbest/dg_phy_11.2.0.4:1.0 & nohup docker pull registry.cn-hangzhou.aliyuncs.com/lhrbest/dg_pri_11.2.0.4:1.0 & docker tag registry.cn-hangzhou.aliyuncs.com/lhrbest/dg_phy_11.2.0.4:1.0 lhrbest/dg_phy_11.2.0.4:1.0 docker tag registry.cn-hangzhou.aliyuncs.com/lhrbest/dg_pri_11.2.0.4:1.0 lhrbest/dg_pri_11.2.0.4:1.0 -- 创建DG的网络 docker network create --subnet=192.168.68.0/16 mhalhr docker network inspect mhalhr -- 分别初始化主库和备库 docker run -itd --name LHR11G -h LHR11G \ -p 1528:1521 -p 1128:1158 -p 228:22 -p 13398:3389 \ --network mhalhr --ip 192.168.68.68 \ --privileged=true \ lhrbest/dg_pri_11.2.0.4:1.0 init docker run -itd --name LHR11GDG -h LHR11GDG \ -p 1529:1521 -p 1129:1158 -p 229:22 -p 13399:3389 \ --network mhalhr --ip 192.168.68.69 \ --privileged=true \ lhrbest/dg_phy_11.2.0.4:1.0 init -- 添加网卡 docker network connect bridge LHR11G docker network connect bridge LHR11GDG -- 进入容器 docker exec -it LHR11G bash docker exec -it LHR11GDG bash -- 分别启动主库、备库和监听 su - oracle lsnrctl start sas startup |
检查同步情况
主库:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | SYS@LHR11G> COL NAME FOR A80 SET LINESIZE 9999 PAGESIZE 9999 SYS@LHR11G> SELECT A.THREAD#, A.NAME, A.SEQUENCE#, A.APPLIED, A.FIRST_TIME 2 FROM V$ARCHIVED_LOG A, (SELECT NB.THREAD#,NB.RESETLOGS_ID,(MAX(NB.SEQUENCE#) - 3) MAX_SEQUENCE# 4 FROM V$ARCHIVED_LOG NB 5 WHERE NB.APPLIED = 'YES' 6 and RESETLOGS_ID=(select max(nbb.RESETLOGS_ID) from V$ARCHIVED_LOG nbb) 7 GROUP BY NB.THREAD#,NB.RESETLOGS_ID) B 8 WHERE B.THREAD# = A.THREAD# 9 AND B.RESETLOGS_ID = A.RESETLOGS_ID 10 AND A.SEQUENCE# >= MAX_SEQUENCE# 11 and a.RESETLOGS_ID=(select max(nb.RESETLOGS_ID) from V$ARCHIVED_LOG nb) 12 AND A.STANDBY_DEST = (SELECT CASE WHEN NB.DATABASE_ROLE LIKE '%STANDBY%' THEN 'NO' ELSE 'YES' END FROM V$DATABASE NB) 13 ORDER BY A.THREAD#, A.SEQUENCE#; THREAD# NAME SEQUENCE# APPLIED FIRST_TIME ---------- -------------------------------------------------------------------------------- ---------- ------------------ ------------------- 1 lhr11gdg 125 YES 2023-09-20 06:27:33 1 lhr11gdg 126 YES 2023-09-20 12:04:33 1 lhr11gdg 127 YES 2023-09-20 19:50:15 1 lhr11gdg 128 YES 2024-08-19 14:20:18 1 lhr11gdg 129 NO 2024-08-19 14:20:24 |
备库:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | SYS@LHR11GDG> -- 查询应用日志序列号 COL NAME FOR A80 SET LINESIZE 9999 PAGESIZE 9999 SELECT A.THREAD#, A.NAME, A.SEQUENCE#, A.APPLIED, A.FIRST_TIME FROM V$ARCHIVED_LOG A, 3 (SELECT NB.THREAD#,NB.RESETLOGS_ID,(MAX(NB.SEQUENCE#) - 3) MAX_SEQUENCE# 4 FROM V$ARCHIVED_LOG NB WHERE NB.APPLIED = 'YES' 6 and RESETLOGS_ID=(select max(nbb.RESETLOGS_ID) from V$ARCHIVED_LOG nbb) 7 GROUP BY NB.THREAD#,NB.RESETLOGS_ID) B 8 WHERE B.THREAD# = A.THREAD# 9 AND B.RESETLOGS_ID = A.RESETLOGS_ID 10 AND A.SEQUENCE# >= MAX_SEQUENCE# and a.RESETLOGS_ID=(select max(nb.RESETLOGS_ID) from V$ARCHIVED_LOG nb) 12 AND A.STANDBY_DEST = (SELECT CASE WHEN NB.DATABASE_ROLE LIKE '%STANDBY%' THEN 'NO' ELSE 'YES' END FROM V$DATABASE NB) 13 ORDER BY A.THREAD#, A.SEQUENCE#; THREAD# NAME SEQUENCE# APPLIED FIRST_TIME ---------- -------------------------------------------------------------------------------- ---------- ------------------ ------------------- 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2023_09_20/o1_mf_1_125_l 125 YES 2023-09-20 06:27:33 jnvtkc2_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2023_09_20/o1_mf_1_126_l 126 YES 2023-09-20 12:04:33 joq3qgy_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_127_m 127 YES 2023-09-20 19:50:15 d5s2ymk_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_128_m 128 YES 2024-08-19 14:20:18 d5s2yfb_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_129_m 129 IN-MEMORY 2024-08-19 14:20:24 d5s31df_.arc |
可看到,是同步的。
主库建表测试:
1 | create table t1(id int); |
备库查询:
1 | select * from t1; |
确认是实时同步的。
主库创建登陆触发器
这里只是简单给个示例,具体业务场景可能更复杂,例如记录客户端的详细信息等内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | CREATE SEQUENCE logon_audit_log_seq START WITH 1 INCREMENT BY 1 NOCACHE; drop table logon_audit_log; CREATE TABLE logon_audit_log ( id NUMBER PRIMARY KEY, username VARCHAR2(30) NOT NULL, logon_time date NOT NULL, terminal VARCHAR2(100), osuser VARCHAR2(30), session_id VARCHAR2(30), host VARCHAR2(100), ip_address VARCHAR2(45), module VARCHAR2(48), actions VARCHAR2(32) ); CREATE OR REPLACE TRIGGER logon_trigger_name AFTER LOGON ON DATABASE BEGIN INSERT INTO logon_audit_log ( id,username, logon_time, terminal, osuser, session_id, host, ip_address, module, actions ) VALUES ( logon_audit_log_seq.nextval, USER, sysdate, SYS_CONTEXT('USERENV', 'TERMINAL'), SYS_CONTEXT('USERENV', 'OS_USER'), SYS_CONTEXT('USERENV', 'SESSIONID'), SYS_CONTEXT('USERENV', 'HOST'), SYS_CONTEXT('USERENV', 'IP_ADDRESS'), SYS_CONTEXT('USERENV', 'MODULE'), SYS_CONTEXT('USERENV', 'ACTION') ); END; |
创建完成后,在备库也可以查询到该触发器,此时,主备依然是实时同步的。
建表删表测试同步正常!!!
触发故障
主备库都做重启动作。
重启后,发现告警日志开始报错,主库:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ****************************************************************** LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 ****************************************************************** ****************************************************************** Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ FAL[server, ARC3]: Error 16191 creating remote archivelog file 'lhr11gdg' FAL[server, ARC3]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance LHR11G - Archival Error. Archiver continuing. Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ Error 16191 for archive log file 2 to 'lhr11gdg' Errors in file /u01/app/oracle/diag/rdbms/lhr11g/LHR11G/trace/LHR11G_nsa2_3230.trc: ORA-16191: Primary log shipping client not logged on standby Mon Aug 19 17:06:22 2024 Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ FAL[server, ARC3]: Error 16191 creating remote archivelog file 'lhr11gdg' FAL[server, ARC3]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance LHR11G - Archival Error. Archiver continuing. Mon Aug 19 17:07:11 2024 Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ PING[ARC1]: Heartbeat failed to connect to standby 'lhr11gdg'. Error is 16191. |
备库:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | Mon Aug 19 17:06:10 2024 SMON: enabling cache recovery Dictionary check beginning Dictionary check complete Database Characterset is AL32UTF8 No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Physical standby database opened for read only access. Errors in file /u01/app/oracle/diag/rdbms/lhr11gdg/LHR11GDG/trace/LHR11GDG_ora_3796.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-44311: service dg_taf_lhr not running ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86 ORA-06512: at "SYS.DBMS_SERVICE", line 466 ORA-06512: at "SYS.DBMS_SERVICE", line 400 ORA-06512: at "SYS.DG_TAF_PROC_LHR", line 8 ORA-06512: at line 2 Completed: ALTER DATABASE OPEN |
主库查询:
1 2 3 4 5 6 7 8 | SYS@LHR11G> select dest_id,error,status,log_sequence,applied_scn from v$archive_dest a where a.status<>'INACTIVE'; DEST_ID ERROR STATUS LOG_SEQUENCE APPLIED_SCN ---------- ----------------------------------------------------------------- ------------------ ------------ ----------- 1 VALID 151 0 2 ORA-16191: Primary log shipping client not logged on standby ERROR 152 4974779 SYS@LHR11G> |
备库查询:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | SYS@LHR11GDG> SET LINESIZE 9999 PAGESIZE 9999 SELECT A.THREAD#, A.NAME, A.SEQUENCE#, A.APPLIED, A.FIRST_TIME 2 FROM V$ARCHIVED_LOG A, 3 (SELECT NB.THREAD#,NB.RESETLOGS_ID,(MAX(NB.SEQUENCE#) - 3) MAX_SEQUENCE# FROM V$ARCHIVED_LOG NB WHERE NB.APPLIED = 'YES' 6 and RESETLOGS_ID=(select max(nbb.RESETLOGS_ID) from V$ARCHIVED_LOG nbb) 7 GROUP BY NB.THREAD#,NB.RESETLOGS_ID) B 8 WHERE B.THREAD# = A.THREAD# AND B.RESETLOGS_ID = A.RESETLOGS_ID AND A.SEQUENCE# >= MAX_SEQUENCE# 11 and a.RESETLOGS_ID=(select max(nb.RESETLOGS_ID) from V$ARCHIVED_LOG nb) 12 AND A.STANDBY_DEST = (SELECT CASE WHEN NB.DATABASE_ROLE LIKE '%STANDBY%' THEN 'NO' ELSE 'YES' END FROM V$DATABASE NB) 13 ORDER BY A.THREAD#, A.SEQUENCE#; THREAD# NAME SEQUENCE# APPLIED FIRST_TIME ---------- -------------------------------------------------------------------------------- ---------- ------------------ ------------------- 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_147_m 147 YES 2024-08-19 16:34:26 d61odw4_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_148_m 148 YES 2024-08-19 16:34:32 d61opg1_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_149_m 149 YES 2024-08-19 16:47:41 d61os8s_.arc 1 /u01/app/oracle/flash_recovery_area/LHR11GDG/archivelog/2024_08_19/o1_mf_1_150_m 150 YES 2024-08-19 16:47:53 d62r272_.arc SYS@LHR11GDG> |
日志已经不同步了。
此时,做实时验证,发现不能同步了(主库建表,备库查询即可)。
排查故障
生成trace来检查。
主库检查
在主库都做如下操作:
1 2 3 4 5 6 7 | alter system set events '16191 trace name errorstack level 10'; alter system set log_archive_dest_state_2='defer' sid='*'; alter system archive log current; alter system set log_archive_dest_state_2='ENABLE' sid='*'; alter system archive log current; |
主库告警日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | Mon Aug 19 17:15:57 2024 ALTER SYSTEM SET log_archive_dest_state_2='defer' SCOPE=BOTH SID='*'; ALTER SYSTEM ARCHIVE LOG Mon Aug 19 17:15:57 2024 Thread 1 advanced to log sequence 153 (LGWR switch) Current log# 3 seq# 153 mem# 0: /u01/app/oracle/oradata/LHR11G/redo03.log Archived Log entry 330 added for thread 1 sequence 152 ID 0x782dc04f dest 1: ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH SID='*'; ALTER SYSTEM ARCHIVE LOG Thread 1 advanced to log sequence 154 (LGWR switch) Current log# 1 seq# 154 mem# 0: /u01/app/oracle/oradata/LHR11G/redo01.log Archived Log entry 331 added for thread 1 sequence 153 ID 0x782dc04f dest 1: Mon Aug 19 17:15:57 2024 ****************************************************************** LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2 ****************************************************************** Mon Aug 19 17:15:58 2024 Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ PING[ARC1]: Heartbeat failed to connect to standby 'lhr11gdg'. Error is 16191. Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ Errors in file /u01/app/oracle/diag/rdbms/lhr11g/LHR11G/trace/LHR11G_nsa2_3230.trc: ORA-16191: Primary log shipping client not logged on standby Error 16191 for archive log file 1 to 'lhr11gdg' Errors in file /u01/app/oracle/diag/rdbms/lhr11g/LHR11G/trace/LHR11G_nsa2_3230.trc: ORA-16191: Primary log shipping client not logged on standby Mon Aug 19 17:15:59 2024 Dumping diagnostic data in directory=[cdmp_20240819171559], requested by (instance=1, osid=3230 (NSA2)), summary=[abnormal process termination]. Mon Aug 19 17:15:59 2024 Error 1017 received logging on to the standby ------------------------------------------------------------ ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ Errors in file /u01/app/oracle/diag/rdbms/lhr11g/LHR11G/trace/LHR11G_arc3_3228.trc: ORA-16191: Primary log shipping client not logged on standby FAL[server, ARC3]: Error 16191 creating remote archivelog file 'lhr11gdg' FAL[server, ARC3]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance LHR11G - Archival Error. Archiver continuing. Dumping diagnostic data in directory=[cdmp_20240819171600], requested by (instance=1, osid=3228 (ARC3)), summary=[abnormal process termination]. |
查看trace文件内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | /u01/app/oracle/diag/rdbms/lhr11g/LHR11G/trace/LHR11G_nsa2_3230.trc *** 2024-08-19 17:06:13.077 1279 krsw.c NSA2: initializing for LGWR communication NSA2: connecting to KSR channel Success NSA2: subscribing to KSR channel Success *** 2024-08-19 17:06:13.078 1324 krsw.c NSA2: initialized successfully ASYNC=1 *** 2024-08-19 17:06:16.074 Destination is specified with ASYNC=61440 OCISessionBegin failed -1 .. Detailed OCI error val is 604 and errmsg is 'ORA-00604: error occurred at recursive SQL level 1 ORA-16000: database open for read-only access ORA-06512: at line 2 ' OCISessionBegin failed. Error -1 .. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied ' *** 2024-08-19 17:06:17.220 OCISessionBegin failed. Error -1 .. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied ' OCISessionBegin failed. Error -1 .. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied ' *** 2024-08-19 17:06:17.234 4329 krsh.c Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ *** 2024-08-19 17:06:17.234 939 krsu.c Error 16191 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'lhr11gdg' Error 16191 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'lhr11gdg' *** 2024-08-19 17:06:17.234 4329 krsh.c Error 16191 for archive log file 2 to 'lhr11gdg' *** 2024-08-19 17:06:17.234 2917 krsi.c krsi_dst_fail: dest:2 err:16191 force:0 blast:1 ORA-16191: Primary log shipping client not logged on standby |
主要错误是 .. Detailed OCI error val is 604 and errmsg is 'ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
ORA-06512: at line 2
备库检查
在备库都做如下操作: