miloluo

45 积分	0 好友	0 主题

发消息

AIX 6.1下安装 RAC 11.2.0.3 root.sh执行失败

1^#

发表于 2012-7-27 11:38:15 | 查看: 10482| 回复: 6

OS: AIX 6.1 (6100-05-01-1016) 64bit
ORACLE version: 11.2.0.3

1. 在第一个节点执行root.sh时出现了如下报错：

# /grid01/11.2.0/product/grid/root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /grid01/11.2.0/product/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /grid01/11.2.0/product/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
User grid has the required capabilities to run CSSD in realtime mode
OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding Clusterware entries to inittab
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2012-07-27 07:39:44.714
[client(4456640)]CRS-2101:The OLR was formatted using version 3.
/grid01/11.2.0/product/grid/perl/bin/perl -I/grid01/11.2.0/product/grid/perl/lib -I/grid01/11.2.0/product/grid/crs/install /grid01/11.2.0/product/grid/crs/install/rootcrs.pl execution failed

复制代码

2. 在后面的cfgtoollogs（ <GRID_HOME>/cfgtoollogs/crsconfig/rootcrs_<nodename>.log）里，有如下报错：

2012-07-27 07:52:14: 'ohasd' is now registered
2012-07-27 07:52:14: Starting ohasd
2012-07-27 07:52:14: Checking the status of ohasd
2012-07-27 07:52:14: Executing cmd: /grid01/11.2.0/product/grid/bin/crsctl check has
2012-07-27 07:52:15: Checking the status of ohasd
2012-07-27 07:52:20: Executing cmd: /grid01/11.2.0/product/grid/bin/crsctl check has
2012-07-27 07:52:21: Checking the status of ohasd
2012-07-27 07:52:26: Executing cmd: /grid01/11.2.0/product/grid/bin/crsctl check has
2012-07-27 07:52:26: Checking the status of ohasd
2012-07-27 07:52:31: ohasd is not already running.. will start it now
2012-07-27 07:52:31: itab entries=cssd|evmd|crsd|ohasd
2012-07-27 07:52:31: Executing /usr/sbin/init q
2012-07-27 07:52:31: Executing cmd: /usr/sbin/init q
2012-07-27 07:52:36: Created backup /etc/inittab.no_crs
2012-07-27 07:52:36: Appending to /etc/inittab.tmp:
2012-07-27 07:52:36: h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1 </dev/null
2012-07-27 07:52:36: Done updating /etc/inittab.tmp
2012-07-27 07:52:36: Saved /etc/inittab.crs
2012-07-27 07:52:36: Installed new /etc/inittab
2012-07-27 07:52:36: Executing /usr/sbin/init q
2012-07-27 07:52:36: Executing cmd: /usr/sbin/init q
2012-07-27 07:52:36: Executing cmd: /grid01/11.2.0/product/grid/bin/crsctl start has
2012-07-27 07:54:37: Command output:
> CRS-4124: Oracle High Availability Services startup failed.
> CRS-4000: Command Start failed, or completed with errors.
>End Command output
2012-07-27 07:54:37: Executing /etc/ohasd install
2012-07-27 07:54:37: Executing cmd: /etc/ohasd install
2012-07-27 07:54:38: ohasd failed to start
2012-07-27 07:54:38: ohasd failed to start
2012-07-27 07:54:38: Alert log is /grid01/11.2.0/product/grid/log/node1/alertnode1.log
2012-07-27 07:54:38: Failed to start service 'ohasd'
2012-07-27 07:54:38: Checking the status of ohasd

复制代码

3. 在detail的log里有如下信息：
/grid01/11.2.0/product/grid/log/node1/alertnode1.log

$ more /grid01/11.2.0/product/grid/log/node1/alertnode1.log
2012-07-27 07:39:44.714
[client(4456640)]CRS-2101:The OLR was formatted using version 3.
2012-07-27 08:02:43.841
[ohasd(4325408)]CRS-0715:Oracle High Availability Service has timed out waiting for init.ohasd to be started.
2012-07-27 10:10:23.750
[client(4194450)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2012-07-27 10:10:23.756
[client(4194450)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid01/11.2.0/product/grid/log/n
ode1/client/crsctl_grid.log.

复制代码

/grid01/11.2.0/product/grid/log/node1/client/crsctl_grid.log

$ more /grid01/11.2.0/product/grid/log/node1/client/crsctl_grid.log
Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
2012-07-27 10:10:18.614: [ OCRMSG][1]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2012-07-27 10:10:18.614: [ OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused]
2012-07-27 10:10:18.614: [ OCRMSG][1]prom_connect: error while waiting for connection complete [24]

复制代码

感觉问题应该出在GPnP profile，但是，在metalink上和google没有找到解决的案例。
请大家帮忙看看。
谢谢。

分享0

收藏0 回复只看该作者道具举报

wqqzlm

2^#

发表于 2012-7-27 11:51:48

看看磁盘属性：
lsattr -E -l xxx | grep reserve_

回复只看该作者道具举报

miloluo

3^#

发表于 2012-7-27 12:32:49

这个属性没问题，已经都设为no_reserve，并且pvid已经去掉。

$ id grid
uid=2000(grid) gid=1601(oinstall) groups=1602(asmadmin),1603(dba),1604(asmdba),1605(asmoper)
$ id oracle
uid=2001(oracle) gid=1601(oinstall) groups=1603(dba),1604(asmdba)
$
$ hostname
node1
$ lsattr -El hdisk2 | grep reserve
reserve_policy no_reserve Reserve Policy True
$ ls -l /dev/rhdisk2
crw-rw---- 1 grid asmadmin 25, 0 Jul 27 06:23 /dev/rhdisk2
$
$
$ hostname
node2
$ lsattr -El hdisk2 | grep reserve
reserve_policy no_reserve Reserve Policy True
$
$ ls -l /dev/rhdisk2
crw-rw---- 1 grid asmadmin 25, 0 Jul 26 21:31 /dev/rhdisk2

复制代码

谢谢。

[ 本帖最后由 miloluo 于 2012-7-27 12:35 编辑 ]

回复只看该作者道具举报

yehc@epsoft.com

4^#

发表于 2012-7-27 13:06:29

以grid用户执行下面的命令检查运行级别：
$cluvfy stage -pre crsinst -n <nodelist>

看看是否哪些没有pass，针对性做调整。

回复只看该作者道具举报

cargoo

5^#

发表于 2012-7-27 13:47:16

“ohasd failed to start”，找找ohasd相关的资料

回复只看该作者道具举报

北柏

6^#

发表于 2012-7-30 11:37:47

it's bug!

AIX6.1TL7 11gR2 RAC bug:
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2012-07-05 15:46:54.573
[client(8454270)]CRS-2101:The OLR was formatted using version 3.

懂的人知道下面这行的意思
/bin/dd if=/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
reference：http://www.itpub.net/thread-1593773-1-1.html

[ 本帖最后由北柏于 2012-7-30 11:41 编辑 ]

回复只看该作者道具举报

miloluo

7^#

发表于 2012-7-30 14:49:26

谢谢，北柏兄指点。

分享一下，我的处理方式：

1. 之前的情况，出现了这个问题后，按照官方的<Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]>做了下排错，最后发现，其实，inittab里是调用了相应的init.ohasd，但是应该是ohasd异常退出了。然后，这个troubleshooting的guide里说这种情况的话，需要SA去做troubleshooting了，我还没有这样的水平。
2. 把整个环境用deinstall卸载，把相应asm的disk的头都清掉，重新尝试安装仍然和原来报错一致。
3. 尝试重新搭建整套东西包括(OS, RAC等)，搭建过程正常。

复制代码

于是乎，感觉可能是没有卸载干净GI的造成的。在GI安装完毕之后，用find命令记录了在安装过程中，变动过的文件，现在还没来的及分析，想以此分析卸载的时候，需要动哪些文件。

北柏兄提供的信息很有价值，由于问题环境已经没有了，无法验证。报错感觉还是有些差别。

BTW, 现在11g的卸载官方似乎没有给出太完成手工卸载的文档，因此，环境出错后，用deinstall未必能完全做到清理干净。

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册