Maclean Liu(刘相兵 发表于 2014-2-7 20:13:32

From 网友Exadata x2-2 (2节点rac+3 台存储服务器)问题

From 网友Exadata x2-2 (2节点rac+3 台存储服务器)问题


db+ grid 11.2.0.3
以下问题请帮忙查看。
1. 数据库一号节点日志文件大量报错信息如下:
Process J001 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/mcsdb/MCSDB1/trace/MCSDB1_cjq0_11148.trc:

2. informatica 应用程式中断并报错:
CMN_1022 Database driver error...
CMN_1022 [
Database driver error...
Function Name : Logon
ORA-01034: ORACLE not available
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 1
Additional information: 3080198
Additional information: 8

数据库参数设置:
1. SESSIONS :4536
2. processes: 3000
3. job_queue_processes : 1000
物理内存96G,数据库采用内存自动管理SGA+PGA=70G



cjq0 log :


*** 2014-01-12 23:27:36.929
Box name 0 - 192.168.10.5
OSS OS Pid - 7807
Reconnect: Attempts: 1 Last TS: 17555931960 Last Use TS: 1356971314026 ctime: 1356973102470 is_idle: 0 has_open_disks: Yes
Dumping SKGXP connection state: Band 0: port ID - 0xbc62588, connection - 0xbc53e70
Dumping SKGXP connection state: Band 1: port ID - 0xbc65038, connection - 0xbc54700
Dumping SKGXP connection state: Band 2: port ID - 0xbc62658, connection - 0xbc54f90
Dumping SKGXP connection state: Band 3: port ID - 0xbc626f8, connection - 0xbc55820
Dumping SKGXP connection state: Band 4: port ID - 0xbc5ac58, connection - 0xbc560b0
Dumping SKGXP connection state: Band 5: port ID - 0xbc62798, connection - 0xbc56940
Dumping SKGXP connection state: Band 6: port ID - 0xbc650d8, connection - 0xbc571d0
Dumping SKGXP connection state: Band 7: port ID - 0xbc62838, connection - 0xbc57a60
Dumping SKGXP connection state: Band 8: port ID - 0xbc5ab68, connection - 0xbc582f0
Reissuing requests for the box 0xbc53800
Reconnected to cell 0xbc53800 ...
Storage box 0xbc53800 Inc: 5 with the source id 4264511293
Box name 0 - 192.168.10.5
OSS OS Pid - 7807
Reconnect: Attempts: 1 Last TS: 17557763060 Last Use TS: 1356973102472 ctime: 1356973102472 is_idle: 0 has_open_disks: Yes
Closing scheduler window

*** 2014-01-13 02:00:00.012
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER:DEFAULT_MAINTENANCE_PLAN via scheduler window

*** 2014-01-14 22:00:00.034
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Closing scheduler window

*** 2014-01-15 02:00:00.006
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Closing scheduler window

*** 2014-01-16 02:00:00.008
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER:DEFAULT_MAINTENANCE_PLAN via scheduler window

*** 2014-01-16 22:00:00.176
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Closing scheduler window

*** 2014-01-17 02:00:00.007
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER:DEFAULT_MAINTENANCE_PLAN via scheduler window

*** 2014-01-19 06:00:00.019
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER:DEFAULT_MAINTENANCE_PLAN via scheduler window

*** 2014-01-20 22:00:00.017
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Closing scheduler window

*** 2014-01-22 02:00:00.010
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER:DEFAULT_MAINTENANCE_PLAN via scheduler window

*** 2014-01-23 22:00:00.062
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER:DEFAULT_MAINTENANCE_PLAN via scheduler window

*** 2014-01-25 06:00:00.008
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter

*** 2014-01-25 06:07:30.064
Process J000 is dead (pid=8751 req_ver=136843 cur_ver=136843 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:31.070
Process J000 is dead (pid=8759 req_ver=1999 cur_ver=1999 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:37.073
Process J000 is dead (pid=8776 req_ver=136844 cur_ver=136844 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:38.074
Process J000 is dead (pid=8787 req_ver=2000 cur_ver=2000 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:40.076
Process J000 is dead (pid=8912 req_ver=136845 cur_ver=136845 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:41.077
Process J000 is dead (pid=8924 req_ver=2001 cur_ver=2001 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:43.080
Process J000 is dead (pid=8929 req_ver=136846 cur_ver=136846 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:44.081
Process J000 is dead (pid=8932 req_ver=2002 cur_ver=2002 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:46.083
Process J000 is dead (pid=8942 req_ver=136847 cur_ver=136847 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:47.084
Process J000 is dead (pid=8944 req_ver=2003 cur_ver=2003 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:49.087
Process J000 is dead (pid=9034 req_ver=136848 cur_ver=136848 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:50.088
Process J000 is dead (pid=9037 req_ver=2004 cur_ver=2004 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:52.090
Process J000 is dead (pid=9050 req_ver=136849 cur_ver=136849 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:53.091
Process J000 is dead (pid=9057 req_ver=2005 cur_ver=2005 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:55.093
Process J000 is dead (pid=9100 req_ver=136850 cur_ver=136850 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:56.094
Process J000 is dead (pid=9135 req_ver=2006 cur_ver=2006 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:58.096
Process J000 is dead (pid=9142 req_ver=136851 cur_ver=136851 state=KSOSP_SPAWNED).

*** 2014-01-25 06:07:59.097
Process J000 is dead (pid=9220 req_ver=2007 cur_ver=2007 state=KSOSP_SPAWNED).

*** 2014-01-26 02:03:22.177
Process J000 is dead (pid=3822 req_ver=42114 cur_ver=42114 state=KSOSP_SPAWNED).

*** 2014-01-26 02:03:23.191
Process J000 is dead (pid=3833 req_ver=63276 cur_ver=63276 state=KSOSP_SPAWNED).

*** 2014-01-26 02:03:29.193
Process J000 is dead (pid=3927 req_ver=42115 cur_ver=42115 state=KSOSP_SPAWNED).

*** 2014-01-26 02:03:30.194
Process J000 is dead (pid=3931 req_ver=63277 cur_ver=63277 state=KSOSP_SPAWNED).

*** 2014-01-26 02:03:32.196
Process J000 is dead (pid=3933 req_ver=42116 cur_ver=42116 state=KSOSP_SPAWNED).

Maclean Liu(刘相兵 发表于 2014-2-7 20:23:51

ODM FINDING:

Bug 13538182 : THE MEMORY SIZE OF CJQ PROCESS HAS BEEN GROWTH CONTINUOUSLY.

Hdr: 13538182 11.2.0.2 RDBMS 11.2.0.2 SCHEDULER PRODID-5 PORTID-226 ORA-4030 13855490
Abstract: THE MEMORY SIZE OF CJQ PROCESS HAS BEEN GROWTH CONTINUOUSLY.  



需要当时的OSW和aWR  数据分析内存使用, 另虽然你可以用huge page 把SGA pin在内存中,但不保证能100%解决该问题

osw位于/opt/oracle.oswatcher/osw/

XuLiQiang 发表于 2014-2-8 09:10:59

AWR 报告 烦请老师帮忙看一下

XuLiQiang 发表于 2014-2-8 09:14:17

AWR 报告

Maclean Liu(刘相兵 发表于 2014-2-8 11:40:21

ulimit -a
看一下

考虑不用memory_target ,使用sga_target+huge page ,sga_target你目前实际只用了30多个G

XuLiQiang 发表于 2014-2-8 11:44:44

使用 ulimit -a 命名取值

XuLiQiang 发表于 2014-2-8 11:45:03

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 773848
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 773848
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Maclean Liu(刘相兵 发表于 2014-2-8 11:47:23

max locked memory       (kbytes, -l) 64

太小了

见这里

http://www.askmaclean.com/archives/warning-even-exadata-has-a-wrong-memlock-setting.html


PS:
需要的是 oracle用户的ulimit -a

不是root

XuLiQiang 发表于 2014-2-8 11:48:29

如果采用内存自动分配:系统内存96G  SGA 分 50G PGA分 30G 可以吗

XuLiQiang 发表于 2014-2-8 11:48:54

如果采用内存手动分配:系统内存96G  SGA 分 50G PGA分 30G 可以吗

XuLiQiang 发表于 2014-2-8 11:52:00

对不起老师,刚才是root

$  ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 773848
max locked memory       (kbytes, -l) 74295414
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 131072
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Maclean Liu(刘相兵 发表于 2014-2-8 11:56:20

你目前SGA仅仅用了30G, PGA实际使用仅3g多

可以考虑SGA用45g,pga pga_aggregate_target 给20g

XuLiQiang 发表于 2014-2-8 12:57:30

谢谢 刘老师的指导.
页: [1]
查看完整版本: From 网友Exadata x2-2 (2节点rac+3 台存储服务器)问题