Oracle数据库数据恢复、性能优化»论坛 › Oracle › Oracle数据库管理 › DG 备库切换为主库 HANG住

wkc168

73 积分	0 好友	0 主题

发消息

DG 备库切换为主库 HANG住

1^#

发表于 2012-2-2 22:01:52 | 查看: 9731| 回复: 10

DB:
SQL> select * from v$version
2 /
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

OS :win2003

在DG 中我模拟主库当机（停掉所有服务）

我把备库切换为主库
备库操作：
alter database commit to switchover to primary with session shutdown

在这里HANG在这里

看了告警日志
Switchover: Media recovery is still active
If media recovery active, switchover will wait 900 seconds
Thu Feb 02 21:36:04 2012
Switchover: Media recovery is still active
If media recovery active, switchover will wait 900 seconds

分享0

收藏0 回复只看该作者道具举报

Maclean Liu(刘相兵

2^#

发表于 2012-2-2 22:06:01

跑一下这个脚本

SELECT DECODE(os_backup.backup + rman_backup.backup, 0, 'FALSE', 'TRUE') backup
FROM (SELECT COUNT(*) backup FROM gv$backup WHERE status = 'ACTIVE') os_backup,
(SELECT COUNT(*) backup
FROM gv$session
WHERE status = 'ACTIVE'
AND client_info like '%rman%') rman_backup
/

复制代码

回复只看该作者道具举报

wkc168

3^#

发表于 2012-2-2 22:07:28

回复 2# 的帖子

SQL> SELECT DECODE(os_backup.backup + rman_backup.backup, 0, 'FALSE', 'TRUE') ba
ckup
  2  FROM (SELECT COUNT(*) backup FROM gv$backup WHERE status = 'ACTIVE') os_bac
kup,
  3  (SELECT COUNT(*) backup       FROM gv$session       WHERE status = 'AC
TIVE'          AND client_info like '%rman%') rman_backup
  4  /

BACKU
-----
FALSE

回复只看该作者道具举报

Maclean Liu(刘相兵

4^#

发表于 2012-2-2 22:11:44

跑一下 physical standby 上

- - - - - - - - - - - - - - - - Script begins here - - - - - - - - - - - - - - - -
-- NAME: DG_phy_stby_diag.sql
-- ------------------------------------------------------------------------
-- AUTHOR:
-- Michael Smith - Oracle Support Services - DataServer Group
-- Copyright 2002, Oracle Corporation
-- ------------------------------------------------------------------------
-- PURPOSE:
-- This script is to be used to assist in collection information to help
-- troubeshoot Data Guard issues.
-- ------------------------------------------------------------------------
-- DISCLAIMER:
-- This script is provided for educational purposes only. It is NOT
-- supported by Oracle World Wide Technical Support.
-- The script has been tested and appears to work as intended.
-- You should always run new scripts on a test instance initially.
-- ------------------------------------------------------------------------
-- Script output is as follows:
set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool dgdiag_phystby_&&dbname&&timestamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';
set feedback on
select to_char(sysdate) time from dual;
set echo on
--
-- ARCHIVER can be (STOPPED | STARTED | FAILED) FAILED means that the archiver failed
-- to archive a -- log last time, but will try again within 5 minutes. LOG_SWITCH_WAIT
-- The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log switching is waiting for. Note that
-- if ALTER SYSTEM SWITCH LOGFILE is hung, but there is room in the current online
-- redo log, then value is NULL
column host_name format a20 tru
column version format a9 tru
select instance_name,host_name,version,archiver,log_switch_wait from v$instance;
-- The following select will give us the generic information about how this standby is
-- setup. The database_role should be standby as that is what this script is intended
-- to be ran on. If protection_level is different than protection_mode then for some
-- reason the mode listed in protection_mode experienced a need to downgrade. Once the
-- error condition has been corrected the protection_level should match the protection_mode
-- after the next log switch.
column ROLE format a7 tru
select name,database_role,log_mode,controlfile_type,protection_mode,protection_level
from v$database;
-- Force logging is not mandatory but is recommended. Supplemental logging should be enabled
-- on the standby if a logical standby is in the configuration. During normal
-- operations it is acceptable for SWITCHOVER_STATUS to be SESSIONS ACTIVE or NOT ALLOWED.
column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru
select force_logging,remote_archive,supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker from v$database;
-- This query produces a list of all archive destinations and shows if they are enabled,
-- what process is servicing that destination, if the destination is local or remote,
-- and if remote what the current mount ID is. For a physical standby we should have at
-- least one remote destination that points the primary set but it should be deferred.
COLUMN destination FORMAT A35 WRAP
column process format a7
column archiver format a8
column ID format 99
select dest_id "ID",destination,status,target,
archiver,schedule,process,mountid
from v$archive_dest;
-- If the protection mode of the standby is set to anything higher than max performance
-- then we need to make sure the remote destination that points to the primary is set
-- with the correct options else we will have issues during switchover.
select dest_id,process,transmit_mode,async_blocks,
net_timeout,delay_mins,reopen_secs,register,binding
from v$archive_dest;
-- The following select will show any errors that occured the last time an attempt to
-- archive to the destination was attempted. If ERROR is blank and status is VALID then
-- the archive completed correctly.
column error format a55 tru
select dest_id,status,error from v$archive_dest;
-- Determine if any error conditions have been reached by querying thev$dataguard_status
-- view (view only available in 9.2.0 and above):
column message format a80
select message, timestamp
from v$dataguard_status
where severity in ('Error','Fatal')
order by timestamp;
-- The following query is ran to get the status of the SRL's on the standby. If the
-- primary is archiving with the LGWR process and SRL's are present (in the correct
-- number and size) then we should see a group# active.
select group#,sequence#,bytes,used,archived,status from v$standby_log;
-- The above SRL's should match in number and in size with the ORL's returned below:
select group#,thread#,sequence#,bytes,archived,status from v$log;
-- Query v$managed_standby to see the status of processes involved in the
-- configuration.
select process,status,client_process,sequence#,block#,active_agents,known_agents
from v$managed_standby;
-- Verify that the last sequence# received and the last sequence# applied to standby
-- database.
select al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied"
from (select thread# thrd, max(sequence#) almax
from v$archived_log
where resetlogs_change#=(select resetlogs_change# from v$database)
group by thread#) al,
(select thread# thrd, max(sequence#) lhmax
from v$log_history
where first_time=(select max(first_time) from v$log_history)
group by thread#) lh
where al.thrd = lh.thrd;
-- The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next
-- gap that is currently blocking redo apply from continuing. After resolving the
-- identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again
-- on the physical standby database to determine the next gap sequence, if there is
-- one.
select * from v$archive_gap;
-- Non-default init parameters.
set numwidth 5
column name format a30 tru
column value format a50 wra
select name, value
from v$parameter
where isdefault = 'FALSE';
spool off
- - - - - - - - - - - - - - - - Script ends here - - - - - - - - - - - - - - - -

复制代码

tail -5000 alert*.log

把 mrp 进程的 trace 也上传

回复只看该作者道具举报

wkc168

5^#

发表于 2012-2-2 22:38:12

告警信息

Thu Feb 02 21:04:35 2012
alter database recover managed standby database disconnect from session
MRP0 started with pid=18, OS id=3480
Managed Standby Recovery not using Real Time Apply
Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\LOG1_65_771764791.ARC
Thu Feb 02 21:04:41 2012
Completed: alter database recover managed standby database disconnect from session
Thu Feb 02 21:04:45 2012
Media Recovery Log D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCL\ORCL02_ARCHIVE\LOG1_66_771764791.ARC
Media Recovery Waiting for thread 1 sequence 67
Thu Feb 02 21:06:04 2012
alter database commit to switchover to primary with session shutdown
Thu Feb 02 21:06:04 2012
If media recovery active, switchover will wait 900 seconds
Thu Feb 02 21:18:25 2012
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Feb 02 21:21:04 2012
Switchover: Media recovery is still active
If media recovery active, switchover will wait 900 seconds
Thu Feb 02 21:36:04 2012
Switchover: Media recovery is still active
If media recovery active, switchover will wait 900 seconds
Thu Feb 02 21:51:04 2012
Switchover: Media recovery is still active
If media recovery active, switchover will wait 900 seconds
Thu Feb 02 22:06:04 2012

db_orcl.rar

3.88 KB, 下载次数: 1479

回复只看该作者道具举报

Maclean Liu(刘相兵

6^#

发表于 2012-2-2 22:49:34

ODM FIND:

SQL> select process,status,client_process,sequence#,block#,active_agents,known_agents
2 from v$managed_standby;
PROCESS STATUS CLIENT_P SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
------- ------------ -------- --------- ------ ------------- ------------
ARCH CONNECTED ARCH 0 0 0 0
ARCH CONNECTED ARCH 0 0 0 0
MRP0 WAIT_FOR_LOG N/A 67 0 0 0
MRP 在等日志 67
SQL> -- The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next
SQL> -- gap that is currently blocking redo apply from continuing. After resolving the
SQL> -- identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again
SQL> -- on the physical standby database to determine the next gap sequence, if there is
SQL> -- one.
SQL>
SQL> select * from v$archive_gap;
未选定行
没有archive gap
SQL> select group#,sequence#,bytes,used,archived,status from v$standby_log;
GROUP# SEQUENCE# BYTES USED ARC STATUS
------ --------- ----- ----- --- ----------
5 0 ##### 512 NO UNASSIGNED
6 0 ##### 512 NO UNASSIGNED
7 0 ##### 512 NO UNASSIGNED
8 0 ##### 512 YES UNASSIGNED
Standby Redo log 都是 UNASSIGNED ，而你用的是LGWR模式这点很奇怪
db_file_name_convert D:\oracle\product\10.2.0\oradata\orcl,
D:\oracle\product\10.2.0\oradata\orcl
log_file_name_convert D:\oracle\product\10.2.0\oradata\orcl,
D:\oracle\product\10.2.0\oradata\orcl
LOG_FILE_NAME_CONVERT为啥要这样设置？

复制代码

给我的感觉是 primary上 online redo中的信息没传过来，且你的 standby redo log从未被用过这点很可疑

回复只看该作者道具举报

wkc168

7^#

发表于 2012-2-2 22:53:50

回复 6# 的帖子

db_file_name_convert          D:\oracle\product\10.2.0\oradata\orcl,
                                                D:\oracle\product\10.2.0\oradata\orcl

log_file_name_convert       D:\oracle\product\10.2.0\oradata\orcl,
                                                D:\oracle\product\10.2.0\oradata\orcl

LOG_FILE_NAME_CONVERT为啥要这样设置？

这个我明白是多余

有什么解决方法

回复只看该作者道具举报

Maclean Liu(刘相兵

8^#

发表于 2012-2-2 23:01:29

SQL> set linesize 200 pagesize 1400

SQL> select fnnam,fnonm from x$kccfn where fntyp=3;

回复只看该作者道具举报

wkc168

9^#

发表于 2012-2-2 23:18:44

SQL> select fnnam,fnonm from x$kccfn where fntyp=3;

FNNAM
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------
FNONM
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO03.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO03.LOG

D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO02.LOG

D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO01.LOG

D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_7.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_7.LOG

D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_6.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_6.LOG

D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_5.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_5.LOG

D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_8.LOG
D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\STANDREDO_8.LOG

已选择7行。

回复只看该作者道具举报

Maclean Liu(刘相兵

10^#

发表于 2012-2-2 23:28:52

action plan:

1. 取消 switch over

2. 测试你的standby 日志是否被正常使用，

SQL> select group#,sequence#,bytes,used,archived,status from v$standby_log;

3. 确认 standby redo log位置正确

回复只看该作者道具举报

wkc168

11^#

发表于 2012-2-2 23:30:18

回复 10# 的帖子

SQL> select group#,sequence#,bytes,used,archived,status from v$standby_log;

GROUP# SEQUENCE# BYTES  USED ARC STATUS
------ --------- ----- ----- --- ----------
   5       0 ##### 512 NO  UNASSIGNED
   6       0 ##### 512 NO  UNASSIGNED
   7       0 ##### 512 NO  UNASSIGNED
   8       0 ##### 512 YES UNASSIGNED

已选择4行。

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册