合 Oracle scmn进程占用CPU或内存很高
现象
在19c的rac中,有一个节点内存很高,top结果:
1 2 3 4 5 6 | 5937 oracle 20 0 163.3g 125.1g 125.0g S 3.6 49.7 263:09.55 ora_scmn_orcl1 5939 oracle 20 0 163.3g 124.9g 124.9g S 4.0 49.6 249:26.99 ora_scmn_orcl1 5935 oracle 20 0 163.3g 124.9g 124.9g S 3.3 49.6 250:00.64 ora_scmn_orcl1 5941 oracle 20 0 163.3g 124.9g 124.9g S 5.0 49.6 262:19.10 ora_scmn_orcl1 5966 oracle 20 0 163.0g 120.2g 120.2g S 1.7 47.8 86:58.62 ora_dbw0_orcl1 5971 oracle 20 0 161.0g 119.8g 119.8g S 2.6 47.6 86:49.51 ora_dbw2_orcl1 |
配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | cat >> /etc/sysctl.conf <<"EOF" vm.swappiness = 10 vm.dirty_background_ratio = 3 vm.dirty_ratio = 80 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 vm.min_free_kbytes = 6097152 kernel.shmmni = 4096 kernel.shmall = 1073741824 kernel.shmmax = 4398046511104 kernel.sem = 1024 60000 1024 256 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 fs.aio-max-nr = 1048576 fs.file-max = 6815744 kernel.panic_on_oops = 1 kernel.watchdog_thresh=30 EOF /sbin/sysctl -p |
High SYS CPU Usage ON LMS Thread (SCMN/CR00/RS01) During High Workload (Doc ID 2707048.1)
SYMPTOMS
ON 6-node 19.5 RAC On Red Hat Enterprise 7, it's observed that high SYS CPU is utilized by lms/scmn process/thread during high workload. for example, the issue can be reproduces by below test:
Start parallel recovery(as it is a standby DB).
At the same time, execute a full database backup using rman on the standby.
The issue not only happens on standby, it can also happen on primary or any other RAC environments during high workload.
High sys CPU can be observed on LMS/SCMN from top command.
TIME PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17:17:52 634273 oracle 20 0 342.7g 289580 187964 S 0.9 0.0 253:57.55 ora_scmn+
17:19:52 634255 oracle 20 0 342.5g 289792 187972 S 1.9 0.0 248:18.62 ora_scmn+
17:20:52 634275 oracle 20 0 342.7g 289980 188148 S 2.9 0.0 251:42.28 ora_scmn+
17:23:52 634255 oracle 20 0 342.5g 289792 187972 S 170.8 0.0 248:57.52 ora_scmn+ <<<< Test begins.
17:24:52 634265 oracle 20 0 342.7g 289608 187988 S 178.5 0.0 253:42.23 ora_scmn+
17:24:52 634273 oracle 20 0 342.7g 289580 187964 S 165.4 0.0 257:09.85 ora_scmn+
17:25:52 634265 oracle 20 0 342.7g 289608 187988 S 153.3 0.0 255:20.33 ora_scmn+
17:25:52 634275 oracle 20 0 342.7g 289980 188148 S 151.4 0.0 255:37.04 ora_scmn+
17:26:22 634265 oracle 20 0 342.7g 289608 187988 S 172.9 0.0 256:11.65 ora_scmn+
17:27:52 634273 oracle 20 0 342.7g 289580 187964 S 169.8 0.0 261:26.01 ora_scmn+
17:28:52 634255 oracle 20 0 342.5g 289792 187972 S 152.8 0.0 257:19.47 ora_scmn+
17:29:22 634273 oracle 20 0 342.7g 289580 187964 S 172.6 0.0 263:52.61 ora_scmn+
17:32:22 634273 oracle 20 0 342.7g 289580 187964 S 162.0 0.0 268:38.56 ora_scmn+
Strace the LMS process, it repeats calling get_mempolicy/semtimedop, eg:
get_mempolicy(NULL, NULL, 0, NULL, 0) = 0
open("/proc/self/stat", O_RDONLY) = 159
read(159, "634255 (ora_scmn_stcrma) R 1 634"..., 4096) = 377
close(159) = 0
semtimedop(458755, [{34, -1, 0}], 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(458755, [{34, -1, 0}], 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
......
semtimedop(458755, [{34, -1, 0}], 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(458755, [{34, -1, 0}], 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
"perf" profiling shows below symbol tree.
# Children Self Command Shared Object Symbol # ........ ........ ............... ............................. ........................................ # 13.27% 0.00% ora_cr00_stcrma oracle [.] ksvrdp_int | ---ksvrdp_int kjblcrspslmain |
---|
--13.27%--kjblcrslmain | --13.26%--kslwaitctx |
---|
--13.26%--ksliwat | --13.26%--skgpwwait |
---|
--13.26%--semtimedop | --13.26%--system_call_fastpath |
---|
--13.26%--sys_semtimedop | --13.26%--SYSC_semtimedop |
---|
--13.24%--_raw_qspin_lock
queued_spin_lock_slowpath
|
--13.15%--native_queued_spin_lock_slowpath
Note: SCMN showing high on it's own would be unusual, but that is the way top / ps display it. When we look at threaded outputs it'll normally be one of the other LMS threads. please refer to: