351 积分	0 好友	8 主题

发消息

关于lmon进程的疑惑

1^#

发表于 2012-5-3 09:30:44 | 查看: 6978| 回复: 4

* Lock monitor (LMON) process: The LMON process monitors all instances in a cluster to detect the failure of an instance. It then facilitates the recovery of the global locks held by the failed instance. It is also responsible for reconfiguring locks and other resources when instances leave or are added to the cluster (as they fail and come back online, or as new instances are added to the cluster in real time).

这里说lmon负责监控节点的状态“The LMON process monitors all instances in a cluster to detect the failure of an instance”，但我记得节点的监控不是由cssd进程通过心跳机制来监控的吗？究竟lmon和cssd哪个进程是监控节点健康状态的？

分享0

收藏0 回复只看该作者道具举报

gdpr-dba

2^#

发表于 2012-5-8 10:43:59

刘大帮忙看看这个问题，谢谢。

回复只看该作者道具举报

Maclean Liu(刘相兵

3^#

发表于 2012-5-8 12:32:23

ODM FINDING:

Group Services (clssgs.c) – CSS provides group services by notifying clients (such as lmon) of cluster membership information and changes. When an instance joins the cluster it will join a group via GM. When applications connect to CSS they join their group and all members of the group share some private data such as IPC endpoints. The application will then use these IPC endpoints for communication. There is also some public data accessible to non-group members. The global data store (not persistent) is available for bootstrap and initial contact. The oldest node or the node with the lowest number in the cluster is considered the source of this bitmap although the data is distributed for recovery reasons.

CSS provides group services by notifying clients (such as lmon) of cluster membership information and changes.

LMON 是 css的 client

CSS 负责监控节点Node Monitoring ，但是它也会关心grock member 资源如 rdbms instance
实例的RAC后台进程 LMON会监控 cluster中的其他实例，以保证第一时间能检测到instance crash 并开始Reconfiguration

css也会将集群中成员关系的信息发送给 LMON 。

关于instance recovery 参考：http://www.oracledatabase12g.com ... tance-recovery.html

关于 css nm 参考http://www.oracledatabase12g.com ... vice-internals.html

回复只看该作者道具举报

311ybb

4^#

发表于 2013-4-19 13:26:22

同样的疑惑，主要由于这些术语NM、group srvice
下面是oracle 10官方文档Glossary对lmon的解释:
The background LMON process monitors the entire cluster to manage global resources. LMON manages instance deaths and the associated recovery for any failed instance. In particular, LMON handles the part of recovery associated with global resources. LMON-provided services are also known as Cluster Group Services. The Global Enqueue Service Monitor (LMON), a background process that monitors the health of the cluster database environment and registers and de-registers from CSS。(CSSD解释中的一句化)

这里的Cluster Group Services跟刘大写的CSS进程的group service应该不是一个概念？
CSS进程提供node monitor/group management服务，很多地方也提到LMON里NM功能。
关于NM的疑惑，网上很多都是来自‘大话RAC’的一段话：
Oracle RAC的LMON进程，被赋予了自检功能，这个功能就是LMON提供的CGS服务，总的来说，这个服务以下几个要点。
(1) LMON提供了节点监控(Node Montor)功能:这个功能是用来记录应用层各个节点的健康状态，节点的健康状态是通过一个保存在GRD中的位图< Bitmap)来记录的，每个节点一位，0代表着节点关闭，1代表节点正常运行。各个节点间的LMON会相互通信，确认这个位图的一致性。
(2)节点上的LMON进程间会定期进行通信，这个通信可以通过CM层完成，也可以不通过CM层，直接通过网络层完成。
(3) LMON可以和下层的ClusterWare合作也可以单独工作。当LMON检测到实例级别的“.脑裂”时，LMON会先通知下层的Clusterware，期待借助于Clusterware解决“脑裂”问题。但是RAC并不假设Clusterware肯定能够解决问题，因此，LMON进程不会无尽等待Clusxerware层的处理结果。如果发生等待超时，LMON进程会自动触发IMR (Instance Membership Recovery也叫Instance Membership Reconfiguration )。LMQN进程提供的IMR功能可以看作是Oracle在数据库层提供的“脑裂”、”I0隔离”机制。
(4) LMON主要也是借助两种心跳机制来完成健康监测。
节点间的网络心跳(Network Heartbeat)：可以想象成节点间定时发送Ping包检测节点的状态;如果能够在规定时间内收到相应，就认为对方状态正常。
通过控制文件的磁盘心跳(Controlfile Heartbeat )：每个节点的CKPT进程每个3秒钟更新一次控制文件一个数据块，这个数据块叫作Checkpoint Progress Record；控制文件是共享的，因此实例间可以相互检查对方是否及时更新以判断状态。

这里的NM描述我感觉象是来自于ＤＳＩ４０８里的内容，ＤＳＩ的内容时9i版本的RAC，还没有cluster ware没有CSS进冲突，貌似在10g版本里的oracle 文档并没有提到过lmon的NM功能，也很少有文章介绍9iRAC到10g RAC的变化，所以在看到这些时是大家疑惑的地方,还请刘大指点迷津，把这些概念或者网上错误的地方纠正下!

回复只看该作者道具举报

311ybb

5^#

发表于 2013-4-19 13:49:04

本帖最后由 311ybb 于 2013-4-19 16:05 编辑

问题too simple？

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册