limouse

11 积分	0 好友	0 主题

发消息

RAC 11gR2 VIP的Mac不能及时更新问题

1^#

发表于 2012-2-17 13:00:58 | 查看: 7938| 回复: 5

环境：
两台主机+SAN存储+Cisco 7609+oracle11.2.0.2

故障现象：
一台主机的Public接口线路中断后，VIP飘移到另一台主机，数据库连接正常。
当把网线插回原处public接口重新通讯后，VIP飘回原位，但Ping 该VIP不通，数据库连接中断
经查7609上VIP的mac地址没有更新，clear 该接口的arp信息后，该VIP的MAC更新，一切正常。

请问老大这是什么原因是配置问题还是oracle的一个BUG

分享0

收藏0 回复只看该作者道具举报

Maclean Liu(刘相兵

2^#

发表于 2012-2-17 13:19:29

是什么操作系统？　Windows?

$CRS_HOME/log/<nodename>/crsd/*.log
$CRS_HOME/log/<nodename>/racg/*.log

把这2个目录下的日志上传

回复只看该作者道具举报

limouse

3^#

发表于 2012-2-17 13:43:04

操作系统都是Redhat 5.4，日志附上，13.24分左右作了一次测试

Log.rar

361.77 KB, 下载次数: 1469

回复只看该作者道具举报

Maclean Liu(刘相兵

4^#

发表于 2012-2-17 14:28:12

ODM Data:

Linux: ARP cache issues with Red Hat "balance-alb (mode 6)" bonding driver [ID 756259.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0.7 - Release: 10.1 to 11.1
Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0.7 [Release: 10.1 to 11.1]
Linux x86
Linux x86-64
Oracle Server Enterprise Edition - Version: 10.1.0.2.0 to 11.1.0.7.0
Oracle Clusterware when using bonding on Public network
Oracle Clusterware when using bonding on Private network
Symptoms
When using "balance-alb (mode 6)" bonding driver in an Oracle Clusterware setup, if the VIP fails over to another node, the ping requests for the relocated VIP fail unless the ARP cache is manually cleared.
This causes sessions to hang as the session retries to connect to the VIP but it is unsuccessful due to the stale ARP cache entries.
Cause
This hang is due to the failover mode configured for NIC bonding. Specifically "balance-alb (mode 6)" bonding driver requires ARP to be cleaned manually. This is not an Oracle Clusterware issue. It's recommended to use other mode if there's difficulty to fix the OS issue.
Additionally the on failover "balance-alb" driver issues arp requests every two seconds. This is the way "balance-alb" bonding driver was designed to work in failover situations however this may cause sessions to hang therefore rendering the Oracle Clusterware VIP failover useless.
Solution
This is not an Oracle Clusterware issue. Different NIC bonding modes may be incompatible with certain switches, interface cards and drivers, Customers should test VIP failover in their environment with different modes and choose the one that fits their requirements.
For more information refer on bonding modes refer to bonding.txt which is supplied by the linux vendor.

复制代码

确认你的环境中是否使用了"balance-alb (mode 6)" bonding driver。

另一个类似的case

Continuous Ping Fails For a Short Time Following SCAN Relocation [ID 1061722.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.1.0 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms
After SCAN VIP relocation form one node to another by any means (manual relocation or CRS or node shutdown), the relocated SCAN VIP is not pingable for a small amount of time.
By running any packet analyzer utility on the failed ping packets sent to the relocated SCAN VIP, it should show something like the following:
Frame 760 (60 bytes on wire, 60 bytes captured)
Ethernet II, Src: 00:19:bb:cf:7b:dc (00:19:bb:cf:7b:dc), Dst:
ff:ff:ff:ff:ff:ff (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (reply/gratuitous ARP)
Hardware type: Ethernet (0x0001)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (0x0002)
Sender MAC address: 00:19:bb:cf:7b:dc (00:19:bb:cf:7b:dc)
Sender IP address: 10.3.2.74 (10.3.2.74)
Target MAC address: ff:ff:ff:ff:ff:ff (ff:ff:ff:ff:ff:ff)
Target IP address: 10.3.2.74 (10.3.2.74)
Target MAC address filed is ff:ff:ff:ff:ff:ff and not 00:19:bb:cf:7b:dc as it should be.
Changes
11.2.0.1 Grid Infrastructure (a.k.a CRS) version, with the public interface connected to layer 3 switches or routers that are RFC compliant, specifically that are unable to violate the RFC2002.
Cause
After the start of VIP on a node, the vipagent sends a gratuitous ARP request, there was a problem in the packet sent as gratuitous ARP.
RFC2002:
A Gratuitous ARP is an ARP packet sent by a node in order to
spontaneously cause other nodes to update an entry in their ARP
cache. A gratuitous ARP MAY use either an ARP Request or an ARP
Reply packet. In either case, the ARP Sender Protocol Address
and ARP Target Protocol Address are both set to the IP address
of the cache entry to be updated, and the ARP Sender Hardware
Address is set to the link-layer address to which this cache
entry should be updated. When using an ARP Reply packet, the
Target Hardware Address is also set to the link-layer address to
which this cache entry should be updated (this field is not used
in an ARP Request packet)
Solution
The fix for this code defect Bug:9109880 has been included in the GRID Infrastructure PSU2 patch of 11.2.0.1 version and in version 11.2.0.2.

复制代码

但是oracle 宣称在 11.2.0.2 中已经修复以上1061722.1问题

回复只看该作者道具举报

limouse

5^#

发表于 2012-2-17 16:11:20

关于案例一：
我这边public网卡没有做Bond，所以也不存在这个问题。

案例二：我这边是11.2.0.2不存在这个问题。

郁闷中

回复只看该作者道具举报

Maclean Liu(刘相兵

6^#

发表于 2012-2-17 16:26:18

建议你检查 ora.vip的相关日志，没有上传过。

此外实际在10.2.0.3以后和11gR1中 public network挂掉，默认vip不会自动failover了。

Starting from 10.2.0.4 and 11.1, VIP does not fail-over back to the original node even after the public network problem is resolved. This behavior is the default behavior in 10.2.0.4 and 11.1 and is different from that of 10.2.0.3

http://www.oracledatabase12g.com ... original%20node.htm

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册