- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
2#
发表于 2012-1-13 12:26:09
ODM Data
The ONS Daemon Explained In Oracle RAC / Oracle Clusterware Environment
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 and later [Release: 10.2 and later ]
Information in this document applies to any platform.
Purpose
This note is intended to explain the purpose of the ONS daemon, how it is configured and what need to be checked when troubleshooting an ONS related problem in an Oracle Clusterware installation.
Scope and Application
DBA and Oracle Clusterware installers
The ONS Daemon Explained In Oracle RAC / Oracle Clusterware Environment
1. purpose of the ons daemon
The Oracle Notification Service daemon is an daemon started by the Oracle Clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon is receiving a subset of published clusterware events
via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners, this in order to facilitate:
a. the FAN or Fast Application Notification feature or allowing applications to respond
to database state changes. Fast Connection Failover (FCF) is the client mechanism which uses the
FAN feature to achieve it. FCF clients/subscribers are JDBC, OCI, and ODP.NET in 10gR2.
b. the 10gR2 Load Balancing Advisory (the RLB feature) or the feature that permit load balancing
accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating
an advisory for distribution of work every 30seconds and forward it via racgimon and ONS
to listeners and applications.
2. launching the ons daemon
ons daemon is started as part of the nodeapps in the $ORA_CRS_HOME environment with user oracle, i.e.
crs_stat -p ora.<hostname>.ons | grep ACTION_SCRIPT
ACTION_SCRIPT=/u01/app/oracle/product/crs/bin/racgwrap
crs_getperm ora.hostname.ons
Name: ora.hostname.ons
owner:oracle:rwx,pgrp:dba:r-x,other::r--,
The command used by the clusterware to start/stop/ping the ons is 'onsctl start', 'onsctl stop' and 'onsctl ping'.
It is possible to start/stop the ons daemon on one node via the clusterware commands:
crs_start ora.<hostname>.ons
crs_stop ora.<hostname>.ons
for debugging purposes.
3. configuration of the ons daemon
The configuration files stands in $ORA_CRS_HOME/opmn/conf/ons.config file on all nodes. The different parameters are:
a. the tcp listening port parameters, e.g.
localport=6101
remoteport=6200
The localport is used to communicate with local clients, e.g. the listeners on the server itself. The remoteport is used to communicate with remote ONS daemons, e.g. the ONS daemons running on the other node(s) of the cluster, or with ons clients (e.g. application or listeners).
b. the "useocr" parameter, i.e. useocr=on
When the useocr=on is set, then the ons configuration in the ocr is read to define the servers that will be contacted by the ons daemon to receive ons events from it.
It is set via the ONS configuration assistant launched during the initial
clusterware installation, via root command, e.g.
racgons add_config hostname1:6200 hostname2:6200
The hostname to use need to match the name retrieved from the OS command "hostname"
(see note:744849.1). The port need to match the remoteport setup of point a.
The remote ONS daemons are normally the unique Oracle RAC ons daemons running on all nodes of the cluster,
together with any servers with ons daemons running on them (e.g. a iAS remote installation).
The command "onsctl debug" permit to view the ocr configuration, e.g.
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = hostname1, port = 6200}
Adding remote host hostname1:6200
onscfg[1]
{node = hostname2, port = 6200}
Adding remote host hostname2:6200
"racgons remove_config hostname1" permits to delete the ocr configuration (or replace it
together with the "racgons add_config hostname1:port1" command)
All remote server connections are viewable via the server connection part of the 'onsctl debug'
output, e.g. on a two node rac cluster, the remote node will appear:
Server connections:
ID IP PORT FLAGS SENDQ WORKER BUSY SUBS
---------- --------------- ----- -------- ---------- -------- ------ -----
6 140.087.216.062 6200 00010025 0 1 0
c. "loglevel" and "logfile", e.g.
loglevel=3
logfile=<fullpath_filename>
loglevel specify the level of messages that should be logged by ons. loglevel=3 is
the default level. loglevel=9 is the most verbose level. loglevel=6 is intermediate.
logfile specify the location of the ons logging (default is $ORA_CRS_HOME/opmn/logs/ons.log)
d. optional parameter "usesharedinstall" to permit ons to start when a shared $CRS_HOME is used on all nodes of the Oracle Clusterware. The ons is them appending the OS hostname to different files like the ons.log.<hostname> and the .formfactor.<hostname>
useharedinstall=true
e. optional parameter "allowgroup" to permit installations done with other oracle users than the crs installing user to communicate with the Oracle Clusterware ons, e.g. when the rdbms installation is done with orardbms user and the crs installation with oracrs user, then the "allowgroup' parameter need to be set to true to permit the orardbms listener to communicate with the oracrs ons daemon
allowgroup=true
e. optional parameter "walletfile" to be used to setup ssl to secure the ONS communication via a walletfile.
4. ons clients/subscribers
clients or subscribers connected to the ons are viewable via the SUBS column of the client connections
output of the 'onsctl debug', e.g.
Client connections:
ID IP PORT FLAGS SENDQ WORKER BUSY SUBS
---------- --------------- ----- -------- ---------- -------- ------ -----
2 127.000.000.001 6101 0001001a 0 1 0
5 127.000.000.001 6101 0001001a 0 1 1
a. the listeners are subscribers for the ons daemon
When the ons is started, the listeners will register to the ons as client subscribers to all FAN and RLB events.
Parameter SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name>=ON need to be set in the
listener.ora files. When that parameter is set and TRACE_LEVEL_<listener_name>=16 is set,
then a problem to subscribe to the locally running ONS can be viewed in the listener.log via messages like
WARNING: Subscription for node down event still pending
It is normally due to note:284602.1 and bug:4417761. When you start the listener using lsnrctl, environment variable ORACLE_CONFIG_HOME = {Oracle Clusterware HOME}
need to be set prior to 10.2.0.4 (settable in the $ORACLE_HOME/bin/racgwrap scripts).
b. application clients/subscribers
With Oracle Database 10g Release 1, JDBC clients (both thick and thin driver) are integrated
with FAN by providing FCF. With Oracle Database 10g Release 2, ODP.NET and OCI clients
have been added. note:433827.1 can be used to setup an FCF client.
5. the FAN and RLB events
There are two types of events ONS handle. The FAN event (or HA events) are meant for FAN processing.The RLB events are meant for workload management. When setting loglevel to 9, it is possible to check the events viewed in the $ORA_CRS_HOME/opmn/logs/ons.log files.
a. the FAN event (event type=database/event/service) are forwarded by the racgimon and evmd clusterware processes to the ons daemon
1. FAN events forwarded from the racgimon daemons to the ons daemon
The racgimon forward instance and service up/down events to the ons daemon.
e.g. ../opmn/logs> grep -E "body|VERSION" ons.log (loglevel=9)
09/01/07 13:56:24 [8] Connection 2,127.0.0.1,6101 body:
VERSION=1.0 service= instance=ASM1 database= host=hostname1 status=up reason=boot
09/01/07 13:56:45 [8] Connection 4,127.0.0.1,6101 body:
VERSION=1.0 service=H102 instance=H1021 database=H102 host=hostname1 status=up reason=boot
09/01/07 13:56:46 [8] Connection 4,127.0.0.1,6101 body:
VERSION=1.0 service=ALL instance=H1021 database=H102 host=hostname1 status=up card=1 reason=boot
...
09/01/07 14:20:26 [8] Connection 5,140.87.216.64,6200 body:
VERSION=1.0 service=H102 instance=H1022 database=H102 host=hostname2 status=down reason=user
09/01/07 14:20:26 [8] Connection 5,140.87.216.64,6200 body:
VERSION=1.0 service=ALL instance=H1022 database=H102 host=hostname2 status=down reason=failure
09/01/07 14:20:27 [8] Connection 5,140.87.216.64,6200 body:
VERSION=1.0 service=ALL instance=H1022 database=H102 host=hostname2 status=not_restarting reason=UNKNOWN
2. FAN events forwarded from the evmd daemon to the ons daemon
It concerns the node down and public network down events. Main bug:6083726 (see note:6083726.8) need to be fixed in this area.
When there is a node down event or a public vip network down event, then the evmd will post an event to
the ONS, i.e. when the vip is stopped on a preferred node, then a public network down event is originated from the failing node. This evm event is received by the evmd on all other surviving nodes via the interconnect. The evmd on the remote nodes then publish the event to the ONS daemon locally.
e.g. ons.log with level=9 showing
VERSION=1.0 host=hostname incarn=100 status=nodedown reason=member_leave
The FAN events are a subset of the EVM events (logged in the $CRS_HOME/evm/log/<hostname>_evmlog.<date> files). All evm events can be viewed via:
evmshow -t "@timestamp @@" <hostname>_evmlog.<date>
b. the RLB events (event type=database/event/servicemetrics/<service_name>) sent by the racgimon on MMON background process request
e.g. Notification Type "database/event/servicemetrics/ALL" set via
exec DBMS_SERVICE.MODIFY_SERVICE (service_name => 'ALL', goal => DBMS_SERVICE.GOAL_THROUGHPUT, clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT)
Querying the sys$service_metrics_tab show MMON events logged every 30seconds, e.g.
SELECT user_data from SYS.SYS$SERVICE_METRICS_TAB order by 1 ;
USER_DATA(SRV, PAYLOAD)
--------------------------------------------------------------------------------
SYS$RLBTYP('ALL', 'VERSION=1.0 database=H102 service=ALL { {instance=H1021 percent=100 flag=UNKNOWN} } timestamp=2009-01-07 21:39:18')
grep -E 'body|percent' ons.log (with loglevel=9) show the same events
VERSION=1.0 database=H102 { {instance=H1021 percent=100 flag=UNKNOWN} } timestamp=2009-01-07 21:39:48
09/01/07 21:39:48 [9] Worker Thread 2 sending body [20:1073838448]: connection 5,140.87.216.64,6200 |
|