rongshiyuan 发表于 2014-11-12 10:47:25

ORA-00600: internal error code, arguments: [kjmvalidate:!pcm], [2], [32], [1]

HP 11.31平台下Oracle 10.2.0.5 RAC 实例1 ORA-00600: internal error code, arguments: , , , , [], [], [], []重启问题

alert部分日志如下:
Tue Nov 11 01:15:12 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_lms3_19665.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
Tue Nov 11 01:15:14 EAT 2014
Trace dumping is performing id=
Tue Nov 11 01:15:14 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_lms3_19665.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
Tue Nov 11 01:15:20 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_lms3_19665.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
Tue Nov 11 01:15:20 EAT 2014
LMS3: terminating instance due to error 484
Tue Nov 11 01:15:22 EAT 2014
Shutting down instance (abort)
License high water mark = 1113
Tue Nov 11 01:15:26 EAT 2014
Instance terminated by LMS3, pid = 19665
Tue Nov 11 01:15:27 EAT 2014
Instance terminated by USER, pid = 2248
Tue Nov 11 01:15:51 EAT 2014
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 lan901 172.16.1.0 configured from OCR for use as a cluster interconnect
Interface type 1 lan900 10.138.1.0 configured from OCR for use as  a public interface
Picked latch-free SCN scheme 3
Tue Nov 11 01:16:03 EAT 2014
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
   processes                = 2000
   resource_limit           = TRUE
   instance_groups          = dcods, dcods1
   sga_max_size             = 85899345920
   __shared_pool_size       = 17884512256
   __large_pool_size        = 33554432
   __java_pool_size         = 33554432
   __streams_pool_size      = 33554432
   spfile                   = /dev/dcodsvg/rspfile_128m
   _ksmg_granule_size       = 33554432
   sga_target               = 85899345920
   control_files            = /dev/dcodsvg/rcontrol1_128m, /dev/dcodsvg/rcontrol2_128m, /dev/dcodsvg/rcontrol3_128m
   db_block_size            = 8192
   __db_cache_size          = 67880615936
   compatible               = 10.2.0.3.0
   log_archive_dest         = /backup1
   db_files                 = 2000
   db_file_multiblock_read_count= 16
   cluster_database         = TRUE
   cluster_database_instances= 2
   _gc_undo_affinity        = FALSE
   _gc_affinity_time        = 0
   thread                   = 1
   instance_number          = 1
   undo_management          = AUTO
   undo_tablespace          = UNDOTBS2
   recyclebin               = OFF
   remote_login_passwordfile= EXCLUSIVE
   db_domain                =
   service_names            = dcods
   local_listener           = (ADDRESS=(PROTOCOL=tcp)(HOST=10.138.1.148)(PORT=1521))
   remote_listener          = LISTENERS_DCODS
   session_cached_cursors   = 200
   job_queue_processes      = 10
   parallel_instance_group  = dcods1
   background_dump_dest     = /oracle/product/10.2.0/admin/dcods/bdump
   user_dump_dest           = /oracle/product/10.2.0/admin/dcods/udump
   core_dump_dest           = /oracle/product/10.2.0/admin/dcods/cdump
   audit_file_dest          = /oracle/product/10.2.0/admin/dcods/adump
   audit_trail              = DB, EXTENDED
   db_name                  = dcods
   open_cursors             = 1000
   pga_aggregate_target     = 8589934592
   _optimizer_connect_by_cost_based= FALSE
Cluster communication is configured to use the following interface(s) for this instance
   172.16.1.1
Tue Nov 11 01:16:03 EAT 2014
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
......
......
Tue Nov 11 01:17:31 EAT 2014
lmon registered with NM - instance id 1 (internal mem no 0)
Tue Nov 11 01:17:31 EAT 2014
Reconfiguration started (old inc 0, new inc 8)
Dynamic remastering is disabled
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 1
* domain 0 valid = 1 according to instance 1
......
......
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=24, OS id=3738
Tue Nov 11 01:17:43 EAT 2014
ALTER DATABASE   MOUNT
Tue Nov 11 01:17:47 EAT 2014
Setting recovery target incarnation to 2
Tue Nov 11 01:17:47 EAT 2014
Successful mount of redo thread 1, with mount id 871606684
Tue Nov 11 01:17:47 EAT 2014
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE   MOUNT
Tue Nov 11 01:17:48 EAT 2014
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Tue Nov 11 01:17:48 EAT 2014
Sending CIC to internal enable redo thread
Tue Nov 11 01:17:48 EAT 2014
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=26, OS id=3785
Tue Nov 11 01:17:48 EAT 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=27, OS id=3787
Tue Nov 11 01:17:48 EAT 2014
Thread 1 opened at log sequence 62945
   Current log# 1 seq# 62945 mem# 0: /dev/dcodsvg/rredo1_1a_512m
   Current log# 1 seq# 62945 mem# 1: /dev/dcodsvg/rredo1_1b_512m
Successful open of redo thread 1
Tue Nov 11 01:17:48 EAT 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Nov 11 01:17:48 EAT 2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Tue Nov 11 01:17:48 EAT 2014
ARC0: Becoming the heartbeat ARCH
Tue Nov 11 01:17:48 EAT 2014
SMON: enabling cache recovery
Tue Nov 11 01:17:49 EAT 2014
Successfully onlined Undo Tablespace 4.
Tue Nov 11 01:17:49 EAT 2014
SMON: enabling tx recovery
Tue Nov 11 01:17:49 EAT 2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=28, OS id=3814
Tue Nov 11 01:17:51 EAT 2014
Completed: ALTER DATABASE OPEN
Tue Nov 11 01:19:49 EAT 2014
Thread 1 advanced to log sequence 62946 (LGWR switch)
   Current log# 2 seq# 62946 mem# 0: /dev/dcodsvg/rredo1_2a_512m
   Current log# 2 seq# 62946 mem# 1: /dev/dcodsvg/rredo1_2b_512m
Tue Nov 11 01:21:50 EAT 2014
......
......
Tue Nov 11 01:32:12 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_j000_13225.trc:
ORA-12012: error on auto execute of job 63
ORA-20001: ORA-01427: single-row subquery returns more than one row
ORA-06512: at "PSDSS_BUSI.UG_RPT_CHECK_INTER", line 674
ORA-01427: single-row subquery returns more than one row
ORA-06512: at line 1
Tue Nov 11 01:32:56 EAT 2014
Thread 1 advanced to log sequence 62952 (LGWR switch)
   Current log# 4 seq# 62952 mem# 0: /dev/dcodsvg/rredo1_4a_512m
   Current log# 4 seq# 62952 mem# 1: /dev/dcodsvg/rredo1_4b_512m
Tue Nov 11 01:36:55 EAT 2014
......
......
Tue Nov 11 02:00:01 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_j005_12845.trc:
ORA-12012: error on auto execute of job 3274441
ORA-00054: resource busy and acquire with NOWAIT specified
Tue Nov 11 02:02:24 EAT 2014
Thread 1 advanced to log sequence 62958 (LGWR switch)
   Current log# 2 seq# 62958 mem# 0: /dev/dcodsvg/rredo1_2a_512m
   Current log# 2 seq# 62958 mem# 1: /dev/dcodsvg/rredo1_2b_512m
Tue Nov 11 02:04:22 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_j004_15798.trc:
ORA-12012: error on auto execute of job 63
ORA-20001: ORA-01427: single-row subquery returns more than one row
ORA-06512: at "PSDSS_BUSI.UG_RPT_CHECK_INTER", line 674
ORA-01427: single-row subquery returns more than one row
ORA-06512: at line 1
Tue Nov 11 02:04:49 EAT 2014
Thread 1 advanced to log sequence 62959 (LGWR switch)
   Current log# 3 seq# 62959 mem# 0: /dev/dcodsvg/rredo1_3a_512m
   Current log# 3 seq# 62959 mem# 1: /dev/dcodsvg/rredo1_3b_512m
Tue Nov 11 02:07:22 EAT 2014
Thread 1 advanced to log sequence 62960 (LGWR switch)
   Current log# 4 seq# 62960 mem# 0: /dev/dcodsvg/rredo1_4a_512m
   Current log# 4 seq# 62960 mem# 1: /dev/dcodsvg/rredo1_4b_512m
Tue Nov 11 02:10:23 EAT 2014
Thread 1 advanced to log sequence 62961 (LGWR switch)
   Current log# 1 seq# 62961 mem# 0: /dev/dcodsvg/rredo1_1a_512m
   Current log# 1 seq# 62961 mem# 1: /dev/dcodsvg/rredo1_1b_512m
Tue Nov 11 02:12:35 EAT 2014
Thread 1 advanced to log sequence 62962 (LGWR switch)
   Current log# 2 seq# 62962 mem# 0: /dev/dcodsvg/rredo1_2a_512m
   Current log# 2 seq# 62962 mem# 1: /dev/dcodsvg/rredo1_2b_512m
Tue Nov 11 02:14:56 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_lms4_3242.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
Tue Nov 11 02:14:57 EAT 2014
Trace dumping is performing id=
Tue Nov 11 02:14:58 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_lms4_3242.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
Tue Nov 11 02:15:08 EAT 2014
Errors in file /oracle/product/10.2.0/admin/dcods/bdump/dcods1_lms4_3242.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
Tue Nov 11 02:15:08 EAT 2014
LMS4: terminating instance due to error 484
Tue Nov 11 02:15:10 EAT 2014
Shutting down instance (abort)
License high water mark = 154
......
......

rongshiyuan 发表于 2014-11-12 10:52:27

附件是相关的alert日志、trace日志、awr、addm、ash报告

rongshiyuan 发表于 2014-11-12 10:57:52

相关alert和trace日志

Maclean Liu(刘相兵 发表于 2014-11-12 11:06:05

kjmvalidate - validate receieve message check the validity of the message prior to processing

Maclean Liu(刘相兵 发表于 2014-11-12 11:15:19


EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
enq: TX - row lock contention384,405191,12349776.2Application
CPU time45,13118.0
gc current block lost7,1773,3854721.4Cluster
gc cr block lost6,0382,9804931.2Cluster
gc cr failure12,117,0001,8020.7Cluster




Per SecondPer Transaction
Global Cache blocks received:976.85110.30
Global Cache blocks served:3,417.98385.95
GCS/GES messages received:8,401.53948.68
GCS/GES messages sent:9,071.881,024.38
DBWR Fusion writes:49.305.57
Estd Interconnect traffic (KB)38,571.48






Maclean Liu(刘相兵 发表于 2014-11-12 11:17:32

大量 gc current block lost + gc cr block lost + gc cr failure ,  gc流量为 38M 每秒

网络硬件可能存在瓶颈

可能的思路:

优化SQL 减少 GC
优化私有网络

rongshiyuan 发表于 2014-11-12 11:30:15

Maclean Liu(刘相兵 发表于 2014-11-12 11:15 static/image/common/back.gif


感谢回复,可以分析为集群中资源争用导致实例重启吗?

rongshiyuan 发表于 2014-11-13 11:57:02

Maclean Liu(刘相兵 发表于 2014-11-12 11:17 static/image/common/back.gif
大量 gc current block lost + gc cr block lost + gc cr failure ,  gc流量为 38M 每秒

网络硬件可能存 ...

咨询下,从哪些信息可以判断出,网络硬件可能存瓶颈?

rongshiyuan 发表于 2014-11-13 12:35:18

Maclean Liu(刘相兵 发表于 2014-11-12 11:17 static/image/common/back.gif
大量 gc current block lost + gc cr block lost + gc cr failure ,  gc流量为 38M 每秒

网络硬件可能存 ...

是addm 中的 "The instance was consuming 65327 kilo bits per second of interconnect bandwidth"???

Maclean Liu(刘相兵 发表于 2014-11-13 15:51:11

你可以使用ftp 在2个节点间测一下 网络吞吐量 以及 吞吐量高时的网络延迟
页: [1]
查看完整版本: ORA-00600: internal error code, arguments: [kjmvalidate:!pcm], [2], [32], [1]