Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

0

积分

0

好友

1

主题
1#
发表于 2014-7-3 16:08:51 | 查看: 13813| 回复: 7
本帖最后由 Tornado 于 2014-7-3 21:40 编辑

alert_EPT.rar (52.69 KB, 下载次数: 2268) messages.rar (46.58 KB, 下载次数: 2241) XXX_diag_22877.rar (397.65 KB, 下载次数: 1710)
故障现象:11点42分数据库宕掉,12点07重启后解决。请高手帮忙诊断下是不是Large Pages的问题,还是其它原因导致数据库宕机。
环境:os:Red Hat Enterprise Linux Server release 6.2    oracle:11.2.0.3  单实例+文件系统
alert日志:
Incremental checkpoint up to RBA [0x137.45ba51.0], current log tail at RBA [0x137.45e3b1.0]
Thu Jul 03 09:55:36 2014
Incremental checkpoint up to RBA [0x137.46ca82.0], current log tail at RBA [0x137.46f176.0]
Thu Jul 03 10:25:38 2014
Incremental checkpoint up to RBA [0x137.473e67.0], current log tail at RBA [0x137.474d91.0]
Thu Jul 03 10:55:42 2014
Incremental checkpoint up to RBA [0x137.47f78b.0], current log tail at RBA [0x137.4838d4.0]
Thu Jul 03 11:25:47 2014
Incremental checkpoint up to RBA [0x137.49848e.0], current log tail at RBA [0x137.49b7f9.0]
Thu Jul 03 11:42:28 2014
PMON (ospid: 22867): terminating the instance due to error 471
Thu Jul 03 11:42:28 2014
System state dump requested by (instance=1, osid=22867 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/EPT/saptrace/diag/rdbms/ept/EPT/trace/EPT_diag_22877.trc
Instance terminated by PMON, pid = 22867
Thu Jul 03 12:07:45 2014
Starting ORACLE instance (normal)
****************** Large Pages Information *****************

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 16 MB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB

RECOMMENDATION:
  Total Shared Global Region size is 4290 MB. For optimal performance,
  prior to the next instance restart increase the number
  of unused Large Pages by atleast 2145 2048 KB Large Pages (4290 MB)
  system wide to get 100% of the Shared
  Global Region allocated with Large pages
***********************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
ORACLE_HOME = /oracle/EPT/112_64
System name:    Linux
Node name:      erptest
Release:        2.6.32-220.el6.x86_64
Version:        #1 SMP Wed Nov 9 08:03:13 EST 2011
Machine:        x86_64
VM name:        VMWare Version: 6
Using parameter settings in server-side spfile /oracle/EPT/112_64/dbs/spfileEPT.ora
System parameters with non-default values:
  processes                = 190
  sessions                 = 384
  shared_pool_size         = 2112M
  filesystemio_options     = "setall"
  control_files            = "/oracle/EP0/control/cntrlEP0_a.dbf"
  control_files            = "/oracle/EP0/control/cntrlEP0_b.dbf"
  control_file_record_keep_time= 30
  db_block_size            = 8192
  db_cache_size            = 2112M
  compatible               = "11.2.0.2.0"
  log_archive_dest_1       = "LOCATION=/oracle/EPT/oraarch/EPTarch"
  log_archive_format       = "%t_%s_%r.dbf"
  db_recovery_file_dest    = "/oracle/EPT/oraflash"
  db_recovery_file_dest_size= 30000M
  log_checkpoints_to_alert = TRUE
  replication_dependency_tracking= FALSE
  undo_tablespace          = "PSAPUNDO001"
  _in_memory_undo          = FALSE
  recyclebin               = "off"
  sec_case_sensitive_logon = FALSE
  remote_login_passwordfile= "EXCLUSIVE"
  parallel_execution_message_size= 16384
  _table_lookup_prefetch_size= 0
  audit_file_dest          = "/oracle/EPT/saptrace/audit"
  db_name                  = "EPT"
  open_cursors             = 2000
  _sort_elimination_cost_ratio= 10
  _b_tree_bitmap_plans     = FALSE
  star_transformation_enabled= "true"
  parallel_threads_per_cpu = 1
  query_rewrite_enabled    = "false"
  _index_join_enabled      = FALSE
  _optim_peek_user_binds   = FALSE
  pga_aggregate_target     = 2937984122
  _optimizer_mjc_enabled   = FALSE
  diagnostic_dest          = "/oracle/EPT/saptrace"
  max_dump_file_size       = "20000"
Thu Jul 03 12:07:45 2014
PMON started with pid=2, OS id=26678
Thu Jul 03 12:07:45 2014
PSP0 started with pid=3, OS id=26680
Thu Jul 03 12:07:46 2014
VKTM started with pid=4, OS id=26682 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Thu Jul 03 12:07:46 2014
GEN0 started with pid=5, OS id=26686
Thu Jul 03 12:07:46 2014
DIAG started with pid=6, OS id=26688
Thu Jul 03 12:07:46 2014
DBRM started with pid=7, OS id=26690
Thu Jul 03 12:07:47 2014
DIA0 started with pid=8, OS id=26692
Thu Jul 03 12:07:47 2014
MMAN started with pid=9, OS id=26694
Thu Jul 03 12:07:47 2014
DBW0 started with pid=10, OS id=26696
Thu Jul 03 12:07:47 2014
LGWR started with pid=11, OS id=26698
Thu Jul 03 12:07:47 2014
CKPT started with pid=12, OS id=26700
Thu Jul 03 12:07:47 2014
SMON started with pid=13, OS id=26702
Thu Jul 03 12:07:47 2014
RECO started with pid=14, OS id=26704
Thu Jul 03 12:07:47 2014
MMON started with pid=15, OS id=26706
Thu Jul 03 12:07:47 2014
MMNL started with pid=16, OS id=26708
ORACLE_BASE from environment = /oracle
Thu Jul 03 12:07:47 2014
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 454533267
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Thu Jul 03 12:07:51 2014
ALTER DATABASE OPEN





内存相关配置:
SQL> show parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     FALSE
sga_max_size                         big integer 4288M
sga_target                           big integer 0
SQL> show parameter memory      

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_in_memory_undo                      boolean     FALSE
hi_shared_memory_address             integer     0
memory_max_target                    big integer 0
memory_target                        big integer 0
shared_memory_address                integer     0


trc文件见附件。
2#
发表于 2014-7-3 16:19:10
附件呢?

回复 只看该作者 道具 举报

3#
发表于 2014-7-3 16:24:33
dba_sky 发表于 2014-7-3 16:19
附件呢?

@dba_sky,附件已上传

回复 只看该作者 道具 举报

4#
发表于 2014-7-3 20:42:41
Linux + Vmware + 11.2.0.3.0

*** 2014-07-03 11:42:28.296
REQUEST:system state dump at level 10, requested by (instance=1, osid=22867 (PMON)), summary=[abnormal instance termination].
===================================================
SYSTEM STATE (level=10)
------------
System global information:

PROCESS 2: PMON
  ----------------------------------------
  SO: 0x16b3888f8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x16b3888f8, name=process, file=ksu.h LINE:12616, pg=0
  (process) Oracle pid:2, ser:1, calls cur/top: 0x16b633ca0/0x16b633ca0
            flags : (0xe) SYSTEM
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 16
              last post received-location: ksu.h LINE:13945 ID:ksupsc
              last process to post me: 16a3eeda8 3 0
              last post sent: 0 0 26
              last post sent-location: ksa2.h LINE:285 ID:ksasnd
              last process posted by me: 16a3edcf8 60 2
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x16a449268
    O/S info: user: oraept, term: UNKNOWN, ospid: 22867
    OSD pid info: Unix process pid: 22867, image: oracle@erptest (PMON)
    ----------------------------------------
    SO: 0x16aafcc88, type: 14, owner: 0x16b3888f8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
     proc=0x16b3888f8, name=channel handle, file=ksr2.h LINE:367, pg=0
    (broadcast handle) 0x16aafcc88 flag: (2) ACTIVE SUBSCRIBER,
                       owner: 0x16b3888f8 - ospid: 22867
                       event: 1, last message event: 1,
                       last message waited event: 1,
                       next message: (nil)(0), messages read: 0
                      channel: (0x16ab0f8b0) PMON actions channel [name: 2]
                               scope: 7, event: 1, last mesage event: 0,
                               publishers/subscribers: 0/1,
                               messages published: 0
                               heuristic msg queue length: 0
    ----------------------------------------
    SO: 0x16aafcaf8, type: 14, owner: 0x16b3888f8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
     proc=0x16b3888f8, name=channel handle, file=ksr2.h LINE:367, pg=0
    (broadcast handle) 0x16aafcaf8 flag: (2) ACTIVE SUBSCRIBER,
                       owner: 0x16b3888f8 - ospid: 22867
                       event: 1, last message event: 14,
                       last message waited event: 14,
                       next message: (nil)(0), messages read: 1
                      channel: (0x16ab1c8f8) scumnt mount lock [name: 157]
                               scope: 1, event: 48, last mesage event: 14,
                               publishers/subscribers: 0/13,
                               messages published: 1
                               heuristic msg queue length: 0
    ----------------------------------------
    SO: 0x16a52e688, type: 4, owner: 0x16b3888f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x16b3888f8, name=session, file=ksu.h LINE:12624, pg=0
    (session) sid: 97 ser: 1 trans: (nil), creator: 0x16b3888f8
              flags: (0x1000051) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: 0002-, short-term DID:
              txn branch: (nil)
              oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
      Not in wait; last wait ended 0.488532 sec ago
    Wait State:
      fixed_waits=0 flags=0x21 boundary=(nil)/-1
    Session Wait History:
        elapsed time of 0.507551 sec since last wait
     0: waited for 'pmon timer'
        duration=0x12c, =0x0, =0x0
               
               
               
               
               
               
               
               
                PROCESS 19: W001
  ----------------------------------------
  SO: 0x16a3edcf8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x16a3edcf8, name=process, file=ksu.h LINE:12616, pg=0
  (process) Oracle pid:19, ser:60, calls cur/top: 0x164db8d60/0x16b636b00
            flags : (0x2) SYSTEM
            flags2: (0x30),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 258 0 2
              last post received-location: ksl2.h LINE:2362 ID:kslpsr
              last process to post me: 16a3f7328 58 2
              last post sent: 0 0 233
              last post sent-location: ktm.h LINE:614 ID:ktmpsm
              last process posted by me: 16a3eaae8 1 22
    (latch info) wait_event=258 bits=0
    Process Group: DEFAULT, pseudo proc: 0x16a449268
    O/S info: user: oraept, term: UNKNOWN, ospid: 12111 (DEAD)
    OSD pid info: Unix process pid: 12111, image: oracle@erptest (W001)
       
       
        pmon          最后post  16a3edcf8   last process posted by me: 16a3edcf8 60 2
       
       
        16a3edcf8指向 W001 一个DEAD PROCESS

回复 只看该作者 道具 举报

5#
发表于 2014-7-3 20:47:28
上传一下 /var/log/messages 和  完整的alert.log

回复 只看该作者 道具 举报

6#
发表于 2014-7-3 21:41:52
Maclean Liu(刘相兵 发表于 2014-7-3 20:47
上传一下 /var/log/messages 和  完整的alert.log

@maclean大师,已上传,不胜感激

回复 只看该作者 道具 举报

7#
发表于 2014-7-3 22:04:57
message显示

Jul  3 11:40:09 erptest kernel: Free swap  = 0kB
Jul  3 11:40:09 erptest kernel: Total swap = 4128760kB
Jul  3 11:40:09 erptest kernel: 4094960 pages RAM
Jul  3 11:40:09 erptest kernel: 109792 pages reserved
Jul  3 11:40:09 erptest kernel: 9916734 pages shared
Jul  3 11:40:09 erptest kernel: 1841302 pages non-shared



OS内存用光了哦??? swap 也没得空闲的了?

回复 只看该作者 道具 举报

8#
发表于 2014-7-3 23:11:34
messages里可以看到大量 oom killer的信息, 我相问题比较明确

ul  3 11:40:49 erptest kernel: [12455]   503 12455  1157967   119466   3       0             0 oracle
Jul  3 11:40:49 erptest kernel: [12456]   502 12456  2554032   112022   4       0             0 EPT_00_DIA_W1
Jul  3 11:40:49 erptest kernel: [12462]   503 12462  1157965    47038   6       0             0 oracle
Jul  3 11:40:49 erptest kernel: Out of memory: Kill process 11763 (EPT_00_DIA_W7) score 104 or sacrifice child
Jul  3 11:40:49 erptest kernel: Killed process 11763, UID 502, (EPT_00_DIA_W7) total-vm:10246688kB, anon-rss:113208kB, file-rss:1963024kB
Jul  3 11:41:08 erptest kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Jul  3 11:41:08 erptest kernel: oracle cpuset=/ mems_allowed=0
Jul  3 11:41:08 erptest kernel: Pid: 12462, comm: oracle Not tainted 2.6.32-220.el6.x86_64 #1
Jul  3 11:41:08 erptest kernel: Call Trace:
Jul  3 11:41:08 erptest kernel: [<ffffffff810c2cb1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Jul  3 11:41:08 erptest kernel: [<ffffffff81113a30>] ? dump_header+0x90/0x1b0
Jul  3 11:41:08 erptest kernel: [<ffffffff8120d98c>] ? security_real_capable_noaudit+0x3c/0x70
Jul  3 11:41:08 erptest kernel: [<ffffffff81113eba>] ? oom_kill_process+0x8a/0x2c0
Jul  3 11:41:08 erptest kernel: [<ffffffff81113dae>] ? select_bad_process+0x9e/0x120
Jul  3 11:41:08 erptest kernel: [<ffffffff81114310>] ? out_of_memory+0x220/0x3c0
Jul  3 11:41:08 erptest kernel: [<ffffffff8112402e>] ? __alloc_pages_nodemask+0x89e/0x940
Jul  3 11:41:08 erptest kernel: [<ffffffff81158b7a>] ? alloc_pages_current+0xaa/0x110
Jul  3 11:41:08 erptest kernel: [<ffffffff81110e57>] ? __page_cache_alloc+0x87/0x90
Jul  3 11:41:08 erptest kernel: [<ffffffff81126a7b>] ? __do_page_cache_readahead+0xdb/0x210
Jul  3 11:41:08 erptest kernel: [<ffffffff81126bd1>] ? ra_submit+0x21/0x30
Jul  3 11:41:08 erptest kernel: [<ffffffff81112123>] ? filemap_fault+0x4c3/0x500
Jul  3 11:41:08 erptest kernel: [<ffffffff8113b2c4>] ? __do_fault+0x54/0x510
Jul  3 11:41:08 erptest kernel: [<ffffffff8113b877>] ? handle_pte_fault+0xf7/0xb50

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-21 05:55 , Processed in 0.053829 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569