Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

94

积分

0

好友

1

主题
1#
发表于 2012-4-15 20:24:10 | 查看: 11381| 回复: 11
故障过程如下:
       使用oifcfg delif -force删除了所有的网卡配置信息,oifcfg delif删除最后一个心跳时候会报错,但是加了-force就可以删除掉。
       手动停掉crs后,再启动HAIP就无法启动了导致CRS也无法启动。
       请问下private network配置信息除了记录在OCR与gpnp profile外还有记录在其他地方吗?
       通过crsctl start crs -excl -nocrs也无法启动。
       请问这种情况改如何操作?谢谢
2#
发表于 2012-4-15 21:02:48
1.

建议还原ocr

一般会自动备份在 $ORACLE_HOME/cdata目录下 ,

前提是 你没有使用 ASM 存放ocr


2. 如果你使用ASM 存放ocr , 建议你诊断 crsctl start crs -excl -nocrs 为什么不能启动ASM

这需要日志信息

回复 只看该作者 道具 举报

3#
发表于 2012-4-15 21:07:49

回复 2# 的帖子

OCR存放在ASM下
已经尝试恢复过OCR但是还是无法启动。

回复 只看该作者 道具 举报

4#
发表于 2012-4-15 21:08:10

回复 3# 的帖子

gipcd 记录日志如下:
2012-04-15 20:58:35.685: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:58:35.685: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for generic host
2012-04-15 20:58:35.685: [ CLSINET][1109883200] no network information found, grv 5
2012-04-15 20:58:35.685: [GIPCDMON][1109883200] gipcdMonitorCheckInterfaces: failed to read private interface information ret 1
2012-04-15 20:58:40.676: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:58:40.676: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for generic host
2012-04-15 20:58:40.676: [ CLSINET][1109883200] no network information found, grv 5
2012-04-15 20:58:40.676: [GIPCDMON][1109883200] gipcdMonitorCheckInterfaces: failed to read private interface information ret 1
2012-04-15 20:58:45.676: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:58:45.676: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for generic host
2012-04-15 20:58:45.676: [ CLSINET][1109883200] no network information found, grv 5
2012-04-15 20:58:45.676: [GIPCDMON][1109883200] gipcdMonitorCheckInterfaces: failed to read private interface information ret 1
2012-04-15 20:58:50.677: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:58:50.677: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for generic host
2012-04-15 20:58:50.677: [ CLSINET][1109883200] no network information found, grv 5
2012-04-15 20:58:50.677: [GIPCDMON][1109883200] gipcdMonitorCheckInterfaces: failed to read private interface information ret 1
2012-04-15 20:58:55.679: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:58:55.679: [    GPNP][1109883200] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x1b36f580 Network-Profile section: no Network info specified for generic host
2012-04-15 20:58:55.679: [ CLSINET][1109883200] no network information found, grv 5
2012-04-15 20:58:55.679: [GIPCDMON][1109883200] gipcdMonitorCheckInterfaces: failed to read private interface information ret 1

回复 只看该作者 道具 举报

5#
发表于 2012-4-15 21:08:32
最轻松 的解决方案 是 重新部署 GI/CRS

回复 只看该作者 道具 举报

6#
发表于 2012-4-15 21:10:50

回复 4# 的帖子

alert日志:
2012-04-15 20:50:54.790
[/grid/product/11.2/bin/orarootagent.bin(14526)]CRS-5818:Aborted command 'start for resource: ora.cluster_interconnect.haip 1 1' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:90} in /grid/product/11.2/log/eptest1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2012-04-15 20:50:58.797
[ohasd(14360)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:90} in /grid/product/11.2/log/eptest1/ohasd/ohasd.log.

回复 只看该作者 道具 举报

7#
发表于 2012-4-15 21:16:58

回复 6# 的帖子

rootagent的日志:
2012-04-15 20:49:56.784: [    AGFW][1112275264] {0:0:90} Agent sending last reply for: RESOURCE_START[ora.ctssd 1 1] ID 4098:350
2012-04-15 20:49:57.115: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:49:57.115: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for generic host
2012-04-15 20:49:57.115: [ CLSINET][1114376512] no network information found, grv 5
2012-04-15 20:49:57.115: [ USRTHRD][1114376512] {0:0:90} failed to retrieve interface info, ret 1
2012-04-15 20:49:58.658: [ora.diskmon][1100294464] {0:0:33} [check] DiskmonAgent::check {
2012-04-15 20:49:58.658: [ora.diskmon][1100294464] {0:0:33} [check] DiskmonAgent::check } - 0
2012-04-15 20:49:59.121: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:49:59.121: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for generic host
2012-04-15 20:49:59.121: [ CLSINET][1114376512] no network information found, grv 5
2012-04-15 20:49:59.121: [ USRTHRD][1114376512] {0:0:90} failed to retrieve interface info, ret 1
2012-04-15 20:50:01.117: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:50:01.117: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for generic host
2012-04-15 20:50:01.117: [ CLSINET][1114376512] no network information found, grv 5
2012-04-15 20:50:01.117: [ USRTHRD][1114376512] {0:0:90} failed to retrieve interface info, ret 1
2012-04-15 20:50:01.661: [ora.diskmon][1104496960] {0:0:33} [check] DiskmonAgent::check {
2012-04-15 20:50:01.662: [ora.diskmon][1104496960] {0:0:33} [check] DiskmonAgent::check } - 0
2012-04-15 20:50:03.122: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:965] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for neither generic host, nor for 'eptest1' host
2012-04-15 20:50:03.122: [    GPNP][1114376512] clsgpnpx_prfGetHostNetInfo: [at clsgpnpx.c:959] Result: (5) CLSGPNP_NOT_FOUND. profile 0x2aaab4150000 Network-Profile section: no Network info specified for generic host
2012-04-15 20:50:03.122: [ CLSINET][1114376512] no network information found, grv 5
2012-04-15 20:50:03.122: [ USRTHRD][1114376512] {0:0:90} failed to retrieve interface info, ret 1

回复 只看该作者 道具 举报

8#
发表于 2012-4-15 21:22:30

回复 7# 的帖子

crsctl start crs -excl –nocrs
asm进程是能启动的,但是HAIP无法启动

回复 只看该作者 道具 举报

9#
发表于 2012-4-15 21:24:38
"asm进程是能启动的"

那么你可以还原ocr ,需要确保你的ocr 是在删除private network 之前的备份版本

回复 只看该作者 道具 举报

10#
发表于 2012-4-15 21:25:51
11.2 中正确修改private network的官方指导

How to Modify Private Network Interface in 11.2 Grid Infrastructure

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.
Goal
The purpose of this document is to demonstrate how to change the private network interface configuration stored in the OCR. This may be required if the name of the interface for the private network (cluster interconnect) needs to be changed at the OS level, for example, the private network is configured on a single network interface eth0, now you want to replace it with a bond interface bond0 and eth0 will be part of the bond0 interface. It also includes command for adding/deleting a private network interface.
Solution
As of 11.2 Grid Infrastructure, the CRS daemon (crsd.bin) now has a dependency on the private network configuration stored in the gpnp profile and OCR.  If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible. Therefore care needs to be taken when making modifications to the configuration of the private network. It is important to perform the changes in the correct order.


Note: If only private network IP is going to be changed, the subnet and network interface remain same (for examples changing private IP from 192.168.0.1 to 192.168.0.10), simply shutdown GI stack, make IP modification at OS level (like /etc/hosts, network config etc) for private network, then restart GI stack will complete the task.

The following procedures apply when subnet or network interface name also requires change.


Please take a backup of profile.xml on all cluster nodes before proceeding, as grid user:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk


To modify the private network (cluster_interconnect):

1. Ensure CRS is running on ALL cluster nodes in the cluster

2. As grid user, add new interface:

Find the interface which needs to be removed. For example:
$ oifcfg getif

eth1 100.17.10.0 global public
eth0 192.168.0.0 global cluster_interconnect
Here the eth0 interface will be replaced by bond0 interface.

Add new interface bond0:
$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

For example:
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
This can be done with -global option even if the interface is not available yet, but this can not be done with -node option if the interface is not available, it will lead to node eviction.

If the interface is available on the server, subnet address can be identified by command:
$ oifcfg iflist

It lists the network interface and its subnet address. This command can be run even if CRS is not up and running. Please note, subnet address might not be in the format of x.y.z.0. For example, it can be:
$ oifcfg iflist
lan1 18.1.2.0
lan2 10.2.3.64        << this is the private network subnet address associated with privet network IP: 10.2.3.86

If the scenario is just to add a 2nd private network, for example: new interface is eth3 with subnet address: 192.168.1.96, then issue:
$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect

Verify the change:
$ oifcfg getif


3. Shutdown CRS on all nodes and disable the CRS  as root user:
# crsctl stop crs
# crsctl disable crs

4. Make the network configuration change at OS level as required, ensure the new interface is available on all nodes after the change.
$ ifconfig -a
$ ping <private hostname>

5. Enable CRS and restart CRS on all nodes as root user:
# crsctl enable crs
# crsctl start crs

6. Remove the old interface:
$ oifcfg delif -global eth0

Note #1.  This step is not required for adding 2nd interface scenario.
         #2. If the new interface is added without removing the old interface, eg: old interface still available when CRS restart, then after step 6, CRS needs to be stop and start again to ensure the old interface is no longer in use.

Something to note:

1. If underlying network configuration has been changed, but oifcfg has not been run to make the same change,  then upon CRS restart the CRSD will not be able to start.

The crsd.log will show:
2010-01-30 09:22:47.234: [ default][2926461424] CRS Daemon Starting
..
2010-01-30 09:22:47.273: [ GPnP][2926461424]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=7153, tl=3, f=0
2010-01-30 09:22:47.282: [ OCRAPI][2926461424]clsu_get_private_ip_addresses: no ip addresses found.
2010-01-30 09:22:47.282: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
2010-01-30 09:22:47.283: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
[ OCRAPI][2926461424]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2010-01-30 09:22:47.285: [ OCRAPI][2926461424]a_init:13!: Clusterware init unsuccessful : [44]
2010-01-30 09:22:47.285: [ CRSOCR][2926461424] OCR context init failure. Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7]
2010-01-30 09:22:47.285: [ CRSD][2926461424][PANIC] CRSD exiting: Could not init OCR, code: 44
2010-01-30 09:22:47.285: [ CRSD][2926461424] Done.
Above errors indicate a mismatch between OS setting (oifcfg iflist) and gpnp profile setting profile.xml.

Workaround: restore the OS network configuration back to the original status, start CRS. Then follow above steps to make the changes again.
Please consult with Oracle Support Service if after restoring OS network configuration, CRS still could not start.


2. If any one node is down in the cluster, oifcfg command
will fail with error:
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

Workaround: start CRS on the node where it is not running. Ensure CRS is up on all cluster nodes.

3. If a user other than Grid Infrastructure owner issues above command, it will fail with same error:
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

Workaround: ensure to login as Grid Infrastructure owner to perform such command.

4. From 11.2.0.2 onwards, if attempt to delete the last private interface (cluster_interconnect) without adding a new one first, following error will occur:

PRIF-31: Failed to delete the specified network interface because it is the last private interface

Workaround: Add new private interface first before deleting the old private interface.

5. If CRS is down on the node, the following error is expected:
$ oifcfg getif
PRIF-10: failed to initialize the cluster registry

Workaround: Start the CRS on the node

回复 只看该作者 道具 举报

11#
发表于 2012-4-15 21:38:48

回复 10# 的帖子

确定还原的是删除之前的OCR,但是正常情况下即使丢失OCR,-nocrs HAIP服务也是能启动的。这里启动不了。
正确的配置方式我是知道的。我只是想测试下,之前问您的10g的RAC oifcfg getif无任何信息,这个RAC 为什么能正常关闭启动呢?
想试试11g里面行不行,11g与10g 由于HAIP的存在好像差别挺大的。

回复 只看该作者 道具 举报

12#
发表于 2012-4-16 09:59:38

回复 11# 的帖子

终于搞定了 真纠结!

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-15 10:11 , Processed in 0.090879 second(s), 21 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569