- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
6#
发表于 2012-2-15 22:38:15
Bug 11770054: NEED CLARIFICATION ON NETWORK REQUIREMENT OF PRIVATE INTERCONNECT
Type D - Documentation Fixed in Product Version -
Severity 2 - Severe Loss of Service Product Version 11.2.0.2.0
Status 92 - Closed, Not a Bug Platform 46 - Linux x86
Created 14-Feb-2011 Platform Version NO DATA
Updated 18-Mar-2011 Base Bug -
Database Version 11.2.0.2.0
Affects Platforms Generic
Product Source Oracle
Show Related Products Related Products
Line Oracle Database Products Family Oracle Database
Area Oracle Database Product 5 - Oracle Server - Enterprise Edition
Hdr: 11770054 11.2.0.2.0 PCW 11.2.0.2.0 PRODID-5 PORTID-46
Abstract: NEED CLARIFICATION ON NETWORK REQUIREMENT OF PRIVATE INTERCONNECT
PROBLEM:
--------
From 11.2.0.2, we support HAIP, the multiple private NICs natively (not
bonding required). With this env, we have seen several bugs shows that when
1). subnet on these multiple NICs are the same
2). When cable pulled from one of these NICs, OS is rebooted, which is not
expected.
these bugs are: bug 10638686, bug 10389682, and bug:10277115.
DIAGNOSTIC ANALYSIS:
--------------------
In bug 10277115, Dev AHABBAS commented the following and the bug was closed
as 32.
============================
The problem is that if you have multiple private NICs on the same subnet,
then there are requirements for the way the hardware is cabled. This is not
specifically an HAIP or CRS issue, but a networking issue in general.
Since the movement of packets is going to be based on routing tables and
routes, when multiple nics with the same subnet are used, it is required that
either nic should be able to communicate to either nic on the other box.
That or some solution to ensure that the routing tables are moved properly.
Otherwise, when any cable is pulled things will fail.
If on the other hand we are using separate subnets, then traffic is routed
more "naturally".
=============================
There is no mention in our doc about the above. So the purpose of this doc
bug is to clarify:
1). Should we ask ct to use different subnet on different NIC on HAIP?
2). If dev think the same subnet is ok but there is extra requirement as "
when multiple nics with the same subnet are used, it is required that either
nic should be able to communicate to either nic on the other box" then we
need to make it clear on the documentation, and also what does it really mean
by "either nic should be able to communicate to either nic on the other box",
what ct needs to check in order to make sure "either nic should be able to
communicate to either nic on the other box"? Any example of command to check
this?
We have run into situation that ct asked the exact question as how to check
"either nic should be able to communicate to either nic on the other box"
Bug 10389682: HAIP INTERCONNECT DOES NOT WORK IF IP ADDRESSES ARE IN THE SAME SUBNET
Type B - Defect Fixed in Product Version -
Severity 2 - Severe Loss of Service Product Version 11.2.0.2
Status 96 - Closed, Duplicate Bug Platform 226 - Linux x86-64
Created 10-Dec-2010 Platform Version NO DATA
Updated 16-May-2011 Base Bug 11770054
Database Version 11.2.0.2
Affects Platforms Generic
Product Source Oracle
Show Related Products Related Products
Line Oracle Database Products Family Oracle Database
Area Oracle Database Product 5 - Oracle Server - Enterprise Edition
Hdr: 10389682 11.2.0.2 PCW 11.2.0.2 GIPC PRODID-5 PORTID-226 11770054
Abstract: HAIP INTERCONNECT DOES NOT WORK IF IP ADDRESSES ARE IN THE SAME SUBNET
*** 12/10/10 05:45 am ***
*** 12/10/10 05:45 am *** (CHG: RDBMS Ver.-> NULL -> 11.2.0.2)
*** 12/10/10 05:45 am ***
BUG TYPE CHOSEN
===============
Supportability
Component: Portable ClusterWare
===============================
DETAILED PROBLEM DESCRIPTION
============================
With 11.2.0.2 you can create a redundant interconnect by classifying
multiple (up to 4) interfaces as private.
The clusterware (ora.cluster_interconnect.haip) will then create virtual
addresses) on top of these interfaces. When a interface fails the virtual
interface is relocated on one of the other interfaces.
in a separate subnet.
If this is not the case, the clusterware will relocate the virtual
interface, but if the failed interface is listed as first in the routing
table the other node(s) will still not be able to ping/connect on any of the
private addresses, resulting in a cluster node failure.
Note that you will not see this behaviour when the interface is stopped with
ifdown as this will also remove its entry from the routing table.
This requirement is however not specified in the documentation nor does
oifcfg gives an error when multiple interconnect interfaces with the same
subnet are given
We need to confirm if it is indeed a requirement to use different subnets
for different interconnect interfaces or not
DIAGNOSTIC ANALYSIS
===================
We aplied the following test:
-Collect the below output, when all the interconnect interfaces are working
fine,
oifcfg getif
oifcfg iflist -p -n
ifconfig -q
route -n
- Reproduce the issue, by making one of the interconnect interface to fail
and get the below output:
ifconfig -a
route -n
The customer feedback:
I have uploaded the results of the requested tests, together with the
logfiles from the clusterware.
I have also included some additional comments in the test results.
The following observations where made:
When multiple subnets are used, a disconnect on one node results in the
relocating of the HAIP vip to another nic on all the nodes.
When only one subnet is used, this is not the case (unless the nic is
unplumbed). |
|