- 最后登录
- 2014-4-9
- 在线时间
- 95 小时
- 威望
- 0
- 金钱
- 268
- 注册时间
- 2012-6-24
- 阅读权限
- 10
- 帖子
- 78
- 精华
- 0
- 积分
- 0
- UID
- 528
|
8#
发表于 2013-1-12 21:33:06
本帖最后由 fluttersnow 于 2013-1-12 21:36 编辑
Maclean Liu(刘相兵 发表于 2013-1-11 13:53
alter system checkpoint;
1.
怀疑是 某个进程持有dc_rollback_segment =>_SYSSMU93$ 后dead , row cach ...
kill 2个节点所有连接以后,在1节点还是产生 “ROW CACHE ENQUEUE LOCK” 而且1节点用户无法登陆,再次检查其他trc,发现holding似乎是个j001进程,而且几个trc文件都又找到这个holding- PROCESS 318:
- ----------------------------------------
- SO: 700000605389218, type: 2, owner: 0, flag: INIT/-/-/0x00
- (process) Oracle pid=318, calls cur/top: 7000005dab89db8/70000031505d2c8, flag: (0) -
- int error: 0, call error: 0, sess error: 0, txn error 0
- (post info) last post received: 0 0 136
- last post received-location: kclrcvt
- last process to post me: 700000605373798 1 6
- last post sent: 0 0 123
- last post sent-location: kcrfw_redo_gen: wake LGWR after redo copy
- last process posted by me: 7000006033a1c88 1 6
- (latch info) wait_event=0 bits=2
- holding (efd=55) 7000005ed7579f8 Child cache buffers chains level=1 child#=105349
- Location from where latch is held: kcbgtcr: kslbegin excl:
- Context saved from call: 1015236325
- state=busy(exclusive) (val=0x200000000000013e) holder orapid = 318
- waiters [orapid (seconds since: put on list, posted, alive check)]:
- 319 (107033, 1357866339, 1)
- 21 (99663, 1357866339, 1)
- 177 (87863, 1357866339, 1)
- 206 (87214, 1357866339, 1)
- 203 (86635, 1357866339, 1)
- 250 (85992, 1357866339, 1)
- 190 (83029, 1357866339, 1)
- 182 (79437, 1357866339, 1)
- 166 (75837, 1357866339, 1)
- 261 (74970, 1357866339, 1)
- 276 (74949, 1357866339, 1)
- 257 (74901, 1357866339, 1)
- 290 (74878, 1357866339, 1)
- 179 (73446, 1357866339, 1)
- 194 (72235, 1357866339, 1)
- 159 (68636, 1357866339, 1)
- 151 (65694, 1357866339, 1)
- 229 (65041, 1357866339, 1)
- 148 (64555, 1357866339, 1)
- 343 (61440, 1357866339, 1)
- 277 (58257, 1357866339, 1)
- 360 (57837, 1357866339, 1)
- 345 (54223, 1357866339, 1)
- 362 (50641, 1357866339, 1)
- 193 (47035, 1357866339, 1)
- 223 (43444, 1357866339, 1)
- 363 (40495, 1357866339, 1)
- 99 (39832, 1357866339, 1)
- 184 (39789, 1357866339, 1)
- 446 (36234, 1357866339, 1)
- 10 (35589, 1357866339, 1)
- 207 (21845, 1357866339, 1)
- 233 (18256, 1357866339, 1)
- 279 (14656, 1357866339, 0)
- 334 (11044, 1357866339, 1)
- 195 (7438, 1357866339, 1)
- 604 (178, 1357866339, 1)
- waiter count=37
- Process Group: DEFAULT, pseudo proc: 7000006073efb90
- O/S info: user: oracle, term: UNKNOWN, ospid: 2130496
- OSD pid info: Unix process pid: 2130496, image: oracle@oltp1 (J001)
复制代码 查看发现此进程所运行的sql为大量的update操作
(此处有个疑问:j00x进程是job的进程,而这些操作都是一个triger引发的,为什么也在job进程中运行?)
手动kill 这个j001进程,redo file 状态开始改变为inactive,日志可以正常切换。用户也可以正常连接了,dba_data_files也可以正常查询了,昨天弄了一上午总算是解决了。
这里有个疑问:
对于这种故障,为了快速恢复数据库服务,是否可以直接在trc文件中查找holding关键字找到block的进程?这样是否准确,还是说这次只是碰巧找到? |
|