(1)Instance Failure detected by Cluster Manager and GCS
(2)Reconfiguration of GES resources (enqueues); global resource directory is frozen,During the first phase of recovery, Global Enqueue Services (GES) remasters the enqueues.
(3)Reconfiguration of GCS resources; involves redistribution among surviving instances,The Global Cache Services (GCS) remasters its resources.
(4)One of the surviving instances becomes the “recovering instance”
(5)SMON process of recovering instance starts first pass of redo log read of the failed instance’s redo log thread
(6)SMON finds BWR (block written records) in the redo and removes them as their PI is already written to disk
(7)SMON prepares recovery set of the blocks modified by the failed instance but not written to disk
(8)Entries in the recovery list are sorted by first dirty SCN
(9)SMON informs each block’s master node to take ownership of the block for recovery
(10)Second pass of log read begins.
(11)Redo is applied to the data files.
(12)Global Resource Directory is unfrozen
以上是rac instance recovery的步骤,有几点不明白的:
1.”Reconfiguration of GCS resources“这个应该指的是remaster block资源吧,比如把以crash instance为master node的block remaster到其他节点上,那“Reconfiguration of GES resources (enqueues)”这个是什么意思,具体会做些什么?