- 最后登录
- 2019-3-14
- 在线时间
- 26 小时
- 威望
- 0
- 金钱
- 197
- 注册时间
- 2013-10-19
- 阅读权限
- 10
- 帖子
- 15
- 精华
- 0
- 积分
- 0
- UID
- 1343
|
1#
发表于 2014-10-8 09:03:16
|
查看: 4392 |
回复: 2
本帖最后由 yuhuacanhong 于 2014-10-8 09:04 编辑
我有个10.2.0.4 rac in ASM 操作系统 AIX 6.1
现象是 node2 是正常,node1 cssd 起不来
具体细节如下
1 查看集群状态 on node1
lqdydb1->crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
lqydb1->crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
2查看node 1 crsd.log,ocssd.log 因为没有起来所有没有任何输出
一直在不停的报错
2014-10-03 09:05:20.702: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:05:19.356: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2014-10-03 09:05:19.357: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:05:20.702: [ COMMCRS][261]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_lqydb2_crs))
2014-10-03 09:05:20.702: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2014-10-03 09:05:20.702: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2014-10-03 09:05:22.041: [ COMMCRS][263]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_lqydb2_crs))
2014-10-03 09:05:22.041: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
3查看ocr磁盘
crw-rw---- 1 oracle oinstall 28, 8 Oct 03 19:59 /dev/rhdisk10
crw-rw---- 1 oracle oinstall 28, 13 Oct 03 19:59 /dev/rhdisk11
crw-rw---- 1 oracle oinstall 28, 0 Oct 03 19:59 /dev/rhdisk12
crw-rw---- 1 oracle oinstall 28, 3 Oct 02 00:05 /dev/rhdisk13
crw-rw---- 1 oracle oinstall 28, 6 Oct 03 19:59 /dev/rhdisk14
crw-rw---- 1 oracle oinstall 28, 1 Oct 03 19:59 /dev/rhdisk15
crw-rw---- 1 root oinstall 28, 12 Oct 02 00:05 /dev/rhdisk2
crw-rw---- 1 oracle oinstall 28, 11 Oct 03 20:46 /dev/rhdisk3
crw-rw---- 1 oracle oinstall 28, 9 Oct 03 20:46 /dev/rhdisk4
crw-rw---- 1 oracle oinstall 28, 5 Oct 03 20:46 /dev/rhdisk5
crw-rw---- 1 root oinstall 28, 2 Oct 02 00:05 /dev/rhdisk6
crw-rw---- 1 oracle oinstall 28, 4 Oct 03 19:59 /dev/rhdisk7
crw-rw---- 1 oracle oinstall 28, 7 Oct 02 00:05 /dev/rhdisk8
crw-rw---- 1 oracle oinstall 28, 10 Oct 03 19:59 /dev/rhdisk
磁盘属性正常,和正常的2号节点一样正常。
4 删除 TMP 下的crs* ,.oracle里面的文件
重启后还是一样报错css无法启动
node1 中Crsd.log 还是不停的报错 ,查看私有网络,共有网络,磁盘都是正常
5 查看2号机crs进程
ps -ef|grep crs/bin
root 123004 213746 0 20:23:40 pts/9 0:00 grep crs/bin
oracle 155776 233824 0 18:23:29 - 0:01 /oracle/product/10.2.0/crs/bin/evmd.bin
oracle 168074 94926 0 18:24:00 - 0:00 /oracle/product/10.2.0/crs/bin/oclsomon.bin
root 225486 90702 2 18:23:30 - 1:30 /oracle/product/10.2.0/crs/bin/crsd.bin reboot
oracle 102814 204908 0 18:24:02 - 0:00 /bin/sh -c ulimit -c unlimited; cd /oracle/product/10.2.0/crs/log/lqydb2/cssd; /oracle/product/10.2.0/crs/bin/ocssd || exit $?
root 139692 103212 0 18:24:00 - 0:00 /oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -f
oracle 188846 102814 0 18:24:02 - 0:12 /oracle/product/10.2.0/crs/bin/ocssd.bin
oracle 94926 143852 0 18:24:00 - 0:00 /bin/sh -c cd /oracle/product/10.2.0/crs/log/lqydb2/cssd/oclsomon; ulimit -c unlimited; /oracle/product/10.2.0/crs/bin/oclsomon || exit $?
oracle 107512 155776 0 18:27:27 - 0:00 /oracle/product/10.2.0/crs/bin/evmlogger.bin -o /oracle/product/10.2.0/crs/evm/log/evmlogger.info -l /oracle/product/10.2.0/crs/evm/log/evmlogger.log
6 查看 1号机器crs进程
# ps -ef|grep crs/bin
oracle 135406 70294 0 19:58:49 - 0:00 /oracle/product/10.2.0/crs/bin/evmlogger.bin -o /oracle/product/10.2.0/crs/evm/log/evmlogger.info -l /oracle/product/10.2.0/crs/evm/log/evmlogger.log
root 90430 147700 0 19:41:48 - 0:17 /oracle/product/10.2.0/crs/bin/crsd.bin reboot
oracle 143624 78498 0 19:52:10 - 0:00 /bin/sh -c cd /oracle/product/10.2.0/crs/log/lqydb1/cssd/oclsomon; ulimit -c unlimited; /oracle/product/10.2.0/crs/bin/oclsomon || exit $?
oracle 151990 90884 0 19:58:43 - 0:02 /oracle/product/10.2.0/crs/bin/ocssd.bin
oracle 70294 151648 0 19:57:04 - 0:00 /oracle/product/10.2.0/crs/bin/evmd.bin
oracle 98862 143624 0 19:52:10 - 0:01 /oracle/product/10.2.0/crs/bin/oclsomon.bin
root 152242 102710 0 20:26:32 pts/2 0:00 grep crs/bin
oracle 90884 163856 0 19:58:43 - 0:00 /bin/sh -c ulimit -c unlimited; cd /oracle/product/10.2.0/crs/log/lqydb1/cssd; /oracle/product/10.2.0/crs/bin/ocssd || exit $?
经比较查看发现1号机器进程少了 oprocd进程
root 139692 103212 0 18:24:00 - 0:00 /oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 –f
后来我在1号机器上用root用户手动执行这个进程
#/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 –f &
在1号机器上root用户 后天执行进程后发现css 等都起来了,实例和其他资源都起来了
1号节点 crs.log 没有任何报错
我想请教一下,这个问题的原因一般都怎么判断,我后来发现 1号机器重启一下后,还是需要手动执行
#/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 –f &
cssd才会起来!大概是什么导致cssd没有自动起来,而且oprocd这个进程我在网上查了一下是监控进程,为什么它启动了一下,cssd就起来了,还请刘指点一下,非常感谢!
|
|