FirstNode configuration failed
操作系统: oralce linux 2.6.32-300.10.1.el5uek x86_64数据库版本:oracle 11.2.0.4 采用asm方式
在安装grid时 ,节点 1上 执行 root.sh,抛出如下错误:
# ./root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/product/11.2/grid
Enter the full pathname of the local bin directory: :
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/product/11.2/grid/crs/install/crsconfig_params
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'mydb-node1'
CRS-2676: Start of 'ora.mdnsd' on 'mydb-node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'mydb-node1'
CRS-2676: Start of 'ora.gpnpd' on 'mydb-node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'mydb-node1'
CRS-2672: Attempting to start 'ora.gipcd' on 'mydb-node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'mydb-node1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'mydb-node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'mydb-node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'mydb-node1'
CRS-2676: Start of 'ora.diskmon' on 'mydb-node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'mydb-node1' succeeded
ASM created and started successfully.
Disk Group OCR_VOTE created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk a51f26c020b34f79bf77127a2145d48e.
Successful addition of voting disk 5647f1f40e494ff0bf344debdcbd97b4.
Successful addition of voting disk 3c31f49e4a8d4fa1bffff3ce15e4b6f3.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE a51f26c020b34f79bf77127a2145d48e (ORCL:OCR_VOTE01)
2. ONLINE 5647f1f40e494ff0bf344debdcbd97b4 (ORCL:OCR_VOTE02)
3. ONLINE 3c31f49e4a8d4fa1bffff3ce15e4b6f3 (ORCL:OCR_VOTE03)
Located 3 voting disk(s).
CRS-2672: Attempting to start 'ora.OCR_VOTE.dg' on 'mydb-node1'
CRS-2676: Start of 'ora.OCR_VOTE.dg' on 'mydb-node1' succeeded
FirstNode configuration failed at /u01/product/11.2/grid/crs/install/crsconfig_lib.pm line 9379.
/u01/product/11.2/grid/perl/bin/perl -I/u01/product/11.2/grid/perl/lib -I/u01/product/11.2/grid/crs/install
/u01/product/11.2/grid/crs/install/rootcrs.pl execution failed
我的分析过程如下:
(1)首先 确保runcluvfy.sh stage -pre crsinst -n mydb-nod1,mydb-node2 -fixup -verbose 执行成功。
(2)根据提示,找到/u01/product/11.2/grid/crs/install/crsconfig_lib.pm 第9379行(红色标示即9379行)。
if (isFirstNodeToStart())
{
$success = add_Nodeapps($upgrade_option, \@nodevip, $DHCP_flag,
\@nodes_to_add, \@nodes_to_start);
if ($success != TRUE) {
writeCkpt($ckptName, CKPTFAIL);
die("Failed to add Nodeapps");
}
$success = configFirstNode($DHCP_flag, \@nodes_to_start);
if ($success != SUCCESS) {
writeCkpt($ckptName, CKPTFAIL);
die("FirstNode configuration failed");
}
} else {
......
(3)发现它是调用的configFirstNode函数,在crsconfig_lib.pm中跟踪到这个函数:
sub configFirstNode
#---------------------------------------------------------------------
# Function: Configure first node
# Args : DHCP_flag
# nodes_to_start
# Returns : TRUE if success
# FALSE if failed
#---------------------------------------------------------------------
{
my $DHCP_flag = shift;
my $nodes_to_start_ref = shift;
trace ("Configuring first node");
trace ("DHCP_flag=$DHCP_flag");
trace ("nodes_to_start=@$nodes_to_start_ref");
my $success = SUCCESS;
#set the network interface - Bug 9243302
setNetworkInterface();
if (($CFG->params('ASM_UPGRADE') =~ m/false/i) && (! isASMExists())) {
trace("Prior version ASM does not exist , Invoking add asm");
add_ASM(); # add ora.asm
if ($CFG->ASM_STORAGE_USED) {
createDiskgroupRes(); # add disk group resource, if necessary
}
}
add_acfs_registry();
if ($success &&
add_GNS($CFG->params('GNS_ADDR_LIST'),
$CFG->params('GNS_DOMAIN_LIST')) &&
add_scan() &&
add_scan_listener() &&
add_J2EEContainer() == SUCCESS) {
$success = SUCCESS;
} else {
$success = FAILED;
}
if ($success) {
add_CVU();
}
if ($success &&
start_Nodeapps($DHCP_flag, \@$nodes_to_start_ref) &&
start_GNS() &&
start_scan() &&
start_scan_listener() &&
start_J2EEContainer() == SUCCESS)
{
$success = SUCCESS;
if (($CFG->params('ASM_UPGRADE') =~ m/false/i) && (isASMExists())) {
$success = start_acfs_registry(\@$nodes_to_start_ref);
}
if (s_is92ConfigExists()) {
$success = enable_GSD();
}
else {
remove_gsd_file();
}
} else {
$success = FAILED;
}
if ($success) {
start_CVU();
}
return $success;
} 通过分析下面的日志,得出是上面红色标示这个地方出现的问题,即该条件没有满足(start_J2EEContainer() != SUCCESS)
(4)分析/u01/product/11.2/grid/cfgtoollogs/crsconfig/rootcrs_mydb-node1.log
2015-01-23 10:50:06: Invoking "/u01/product/11.2/grid/bin/srvctl add cvu"
2015-01-23 10:50:06: trace file=/u01/product/11.2/grid/cfgtoollogs/crsconfig/srvmcfg5.log
2015-01-23 10:50:06: Running as user grid: /u01/product/11.2/grid/bin/srvctl add cvu
2015-01-23 10:50:06: Invoking "/u01/product/11.2/grid/bin/srvctl add cvu" as user "grid"
2015-01-23 10:50:06: Executing /bin/su grid -c "/u01/product/11.2/grid/bin/srvctl add cvu"
2015-01-23 10:50:06: Executing cmd: /bin/su grid -c "/u01/product/11.2/grid/bin/srvctl add cvu"
2015-01-23 10:50:18: add cvu ... success
2015-01-23 10:50:18: starting nodeapps...
2015-01-23 10:50:18: DHCP_flag=0
2015-01-23 10:50:18: nodes_to_start=mydb-node1
2015-01-23 10:50:24: exit value of start nodeapps/vip is 0
2015-01-23 10:50:24: GNS is not to be configured - skipping
2015-01-23 10:50:24: Invoking "/u01/product/11.2/grid/bin/srvctl start scan"
2015-01-23 10:50:24: trace file=/u01/product/11.2/grid/cfgtoollogs/crsconfig/srvmcfg6.log
2015-01-23 10:50:24: Executing /u01/product/11.2/grid/bin/srvctl start scan
2015-01-23 10:50:24: Executing cmd: /u01/product/11.2/grid/bin/srvctl start scan
2015-01-23 10:50:28: start scan ... success
2015-01-23 10:50:28: Invoking "/u01/product/11.2/grid/bin/srvctl start scan_listener"
2015-01-23 10:50:28: trace file=/u01/product/11.2/grid/cfgtoollogs/crsconfig/srvmcfg7.log
2015-01-23 10:50:28: Running as user grid: /u01/product/11.2/grid/bin/srvctl start scan_listener
2015-01-23 10:50:28: Invoking "/u01/product/11.2/grid/bin/srvctl start scan_listener" as user "grid"
2015-01-23 10:50:28: Executing /bin/su grid -c "/u01/product/11.2/grid/bin/srvctl start scan_listener"
2015-01-23 10:50:28: Executing cmd: /bin/su grid -c "/u01/product/11.2/grid/bin/srvctl start scan_listener"
2015-01-23 10:50:30: start scan listener ... success
2015-01-23 10:50:31: Running as user grid: /u01/product/11.2/grid/bin/srvctl enable oc4j
2015-01-23 10:50:31: s_run_as_user2: Running /bin/su grid -c ' /u01/product/11.2/grid/bin/srvctl enable oc4j '
2015-01-23 10:50:32: Removing file /tmp/fileqlGEeu
2015-01-23 10:50:32: Successfully removed file: /tmp/fileqlGEeu
2015-01-23 10:50:32: /bin/su exited with rc=1
2015-01-23 10:50:32: J2EE (OC4J) Container Resource enable ... passed
2015-01-23 10:50:32: Running as user grid: /u01/product/11.2/grid/bin/srvctl start oc4j
2015-01-23 10:50:32: s_run_as_user2: Running /bin/su grid -c ' /u01/product/11.2/grid/bin/srvctl start oc4j '
2015-01-23 10:55:48: Removing file /tmp/file107tU5
2015-01-23 10:55:48: Successfully removed file: /tmp/file107tU5
2015-01-23 10:55:48: /bin/su exited with rc=1
2015-01-23 10:55:48: Error encountered in the command /u01/product/11.2/grid/bin/srvctl start oc4j
> OC4J could not be started
> PRCR-1079 : Failed to start resource ora.oc4j
> CRS-2674: Start of 'ora.oc4j' on 'mydb-node1' failed
> CRS-2632: There are no more servers to try to place resource 'ora.oc4j' on that would satisfy its placement policy
> End Command output
2015-01-23 10:55:48: J2EE (OC4J) Container Resource Start ... failed ...
2015-01-23 10:55:48: Running as user grid: /u01/product/11.2/grid/bin/cluutil -ckpt -oraclebase /u01/product/grid -writeckpt -name ROOTCRS_NODECONFIG -state FAIL
2015-01-23 10:55:48: s_run_as_user2: Running /bin/su grid -c ' /u01/product/11.2/grid/bin/cluutil -ckpt -oraclebase /u01/product/grid -writeckpt -name ROOTCRS_NODECONFIG -state FAIL '
2015-01-23 10:55:49: Removing file /tmp/file95mj2L
2015-01-23 10:55:49: Successfully removed file: /tmp/file95mj2L
2015-01-23 10:55:49: /bin/su successfully executed
2015-01-23 10:55:49: Succeeded in writing the checkpoint:'ROOTCRS_NODECONFIG' with status:FAIL
2015-01-23 10:55:49: CkptFile: /u01/product/grid/Clusterware/ckptGridHA_mydb-node1.xml
2015-01-23 10:55:49: Sync the checkpoint file '/u01/product/grid/Clusterware/ckptGridHA_mydb-node1.xml'
2015-01-23 10:55:49: Sync '/u01/product/grid/Clusterware/ckptGridHA_mydb-node1.xml' to the physical disk
2015-01-23 10:55:49: ###### Begin DIE Stack Trace ######
2015-01-23 10:55:49: Package File Line Calling
2015-01-23 10:55:49: --------------- -------------------- ---- ----------
2015-01-23 10:55:49: 1: main rootcrs.pl 393 crsconfig_lib::dietrap
2015-01-23 10:55:49: 2: crsconfig_lib crsconfig_lib.pm 9379 main::__ANON__
2015-01-23 10:55:49: 3: crsconfig_lib crsconfig_lib.pm 9167 crsconfig_lib::configNode
2015-01-23 10:55:49: 4: main rootcrs.pl 921 crsconfig_lib::perform_configNode
2015-01-23 10:55:49: ####### End DIE Stack Trace #######
2015-01-23 10:55:49: 'ROOTCRS_NODECONFIG' checkpoint has failed
2015-01-23 10:55:49: Running as user grid: /u01/product/11.2/grid/bin/cluutil -ckpt -oraclebase /u01/product/grid -writeckpt -name ROOTCRS_NODECONFIG -state FAIL
2015-01-23 10:55:49: s_run_as_user2: Running /bin/su grid -c ' /u01/product/11.2/grid/bin/cluutil -ckpt -oraclebase /u01/product/grid -writeckpt -name ROOTCRS_NODECONFIG -state FAIL '
2015-01-23 10:55:50: Removing file /tmp/filefZw8yw
2015-01-23 10:55:50: Successfully removed file: /tmp/filefZw8yw
2015-01-23 10:55:50: /bin/su successfully executed
2015-01-23 10:55:50: Succeeded in writing the checkpoint:'ROOTCRS_NODECONFIG' with status:FAIL
2015-01-23 10:55:50: CkptFile: /u01/product/grid/Clusterware/ckptGridHA_mydb-node1.xml
2015-01-23 10:55:50: Sync the checkpoint file '/u01/product/grid/Clusterware/ckptGridHA_mydb-node1.xml'
2015-01-23 10:55:50: Sync '/u01/product/grid/Clusterware/ckptGridHA_mydb-node1.xml' to the physical disk (5) 接着我分析了/u01/product/11.2/grid/log/mydb-node1/agent/crsd/scriptagent_grid/scriptagent_grid.log 这个日志,看到如下内容:
2015-01-23 09:42:56.541: {1:40087:2} Start OC4J
2015-01-23 09:43:28.765: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:29.851: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:30.841: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:31.821: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:32.865: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:33.831: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:34.837: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:35.845: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:36.891: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:37.899: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:39.326: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:40.292: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:41.393: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:42.388: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:43.450: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:44.475: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:45.423: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
2015-01-23 09:43:46.444: {1:40087:2} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://loc
alhost:8888/
(中间内容类似,故省略...)
2015-01-23 10:55:42.224: {1:20336:397} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://l
ocalhost:8888/dbwlm-root/dbwlm
2015-01-23 10:55:42.891: [ AGENT]{1:20336:397} {1:20336:397} Created alert : (:CRSAGF00113:) : Aborting the command:
start for resource: ora.oc4j 1 1
2015-01-23 10:55:42.891: {1:20336:397} Killing action script: start
2015-01-23 10:55:42.891: [ AGFW]{1:20336:397} Command: start for resource: ora.oc4j 1 1 completed with status: TIMEDO
UT
2015-01-23 10:55:42.892: [ AGFW]{1:20336:397} Agent sending reply for: RESOURCE_START ID 4098:1063
2015-01-23 10:55:42.930: {1:20336:397} Executing action script: /u01/product/11.2/grid/bin/oc4jctl[che
ck]
2015-01-23 10:55:43.396: {1:20336:397} /u01/product/11.2/grid/bin/oc4jctl.pl: Could not fetch http://l
ocalhost:8888/dbwlm-root/dbwlm
2015-01-23 10:55:43.396: {1:20336:397} Check OC4J
2015-01-23 10:55:43.396: {1:20336:397} http://localhost:8888/: HTTP::Response=HASH(0xf00590)
2015-01-23 10:55:43.396: {1:20336:397} Retcode (Container): 200
2015-01-23 10:55:43.396: {1:20336:397} Retcode (DBWLM): 404
2015-01-23 10:55:43.396: {1:20336:397} Return code is 5
2015-01-23 10:55:43.397: [ AGFW]{1:20336:397} ora.oc4j 1 1 state changed from: STARTING to: FAILED
2015-01-23 10:55:43.397: [ AGFW]{1:20336:397} Agent sending last reply for: RESOURCE_START ID 4098:1063
2015-01-23 10:55:43.400: TM is changing desired thread # to 3. Current # is 2
2015-01-23 10:55:43.402: [ AGFW]{1:20336:397} Agent has no resources to be monitored, Shutting down ..
2015-01-23 10:55:43.402: [ AGFW]{1:20336:397} Agent sending message to PE: AGENT_SHUTDOWN_REQUEST ID 20486:33
2015-01-23 10:55:43.408: [ AGFW]{1:20336:397} Agent received the message: RESOURCE_CLEAN ID 4100:1273
2015-01-23 10:55:43.408: [ AGFW]{1:20336:397} Preparing CLEAN command for: ora.oc4j 1 1
2015-01-23 10:55:43.408: [ AGFW]{1:20336:397} ora.oc4j 1 1 state changed from: FAILED to: CLEANING
2015-01-23 10:55:43.409: {1:20336:397} Executing action script: /u01/product/11.2/grid/bin/oc4jctl[cle
an]
2015-01-23 10:55:43.415: [ AGFW]{1:20336:397} Agfw Proxy Server rejected agent's shutdown request.
2015-01-23 10:55:43.574: {1:20336:397} Clean OC4J
2015-01-23 10:55:47.457: {1:20336:397} Return code is 0
2015-01-23 10:55:47.457: [ AGFW]{1:20336:397} Command: clean for resource: ora.oc4j 1 1 completed with status: SUCCES
S
2015-01-23 10:55:47.458: {1:20336:397} Executing action script: /u01/product/11.2/grid/bin/oc4jctl[che
ck]
2015-01-23 10:55:47.459: [ AGFW]{1:20336:397} Agent sending reply for: RESOURCE_CLEAN ID 4100:1273
2015-01-23 10:55:47.459: TM is changing desired thread # to 4. Current # is 3
2015-01-23 10:55:47.562: {1:20336:397} Check OC4J
2015-01-23 10:55:47.915: {1:20336:397} Return code is 1
2015-01-23 10:55:47.916: [ AGFW]{1:20336:397} ora.oc4j 1 1 state changed from: CLEANING to: OFFLINE
2015-01-23 10:55:47.916: [ AGFW]{1:20336:397} Agent sending last reply for: RESOURCE_CLEAN ID 4100:1273
2015-01-23 10:55:47.918: [ AGFW]{1:20336:397} Agent has no resources to be monitored, Shutting down ..
2015-01-23 10:55:47.918: [ AGFW]{1:20336:397} Agent sending message to PE: AGENT_SHUTDOWN_REQUEST ID 20486:43
2015-01-23 10:55:47.921: [ AGFW]{1:20336:397} Agent is shutting down.
2015-01-23 10:55:47.921: [ USRTHRD]{1:20336:397} Script agent is exiting..
2015-01-23 10:55:47.921: [ AGFW]{1:20336:397} Agent is exiting with exit code: 1 感觉应该是localhost的问题,但是经查也没有发现问题啊
(6)我的 /etc/hosts设置情况
# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost localhost.localdomain localhost
#::1 localhost6.localdomain6 localhost6
192.168.1.111 mydb-node1
192.168.1.112 mydb-node2
192.168.1.211 mydb-node1-vip
192.168.1.212 mydb-node2-vip
192.168.73.111 mydb-node1-priv
192.168.73.112 mydb-node2-priv
192.168.1.222 mydb-no-cluster-scan
# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost localhost.localdomain localhost
#::1 localhost6.localdomain6 localhost6
192.168.1.111 mydb-node1
192.168.1.112 mydb-node2
192.168.1.211 mydb-node1-vip
192.168.1.212 mydb-node2-vip
192.168.73.111 mydb-node1-priv
192.168.73.112 mydb-node2-priv
192.168.1.222 mydb-no-cluster-scan
(7)在两个节点上对DNS进行验证,
# nslookup mydb-node1
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node1
Address: 192.168.1.111
# nslookup mydb-node2
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node2
Address: 192.168.1.112
# nslookup mydb-node1-priv
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node1-priv
Address: 192.168.73.111
# nslookup mydb-node2-priv
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node2-priv
Address: 192.168.73.112
# nslookup mydb-node1
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node1
Address: 192.168.1.111
# nslookup mydb-node2
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node2
Address: 192.168.1.112
# nslookup mydb-node1-priv
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node1-priv
Address: 192.168.73.111
# nslookup mydb-node2-priv
Server: 192.168.73.111
Address: 192.168.73.111#53
Name: mydb-node2-priv
Address: 192.168.73.112 尝试重新部署 节点名字不要叫 XX-YY 这样的形式 谢谢刘大,问题解决。 问题解决 如何解决的,做了哪些变动,贴出来让大家有个参考, 如刘大指点的那样,把主机名中的“-”去掉,执行就成功了。
页:
[1]