记一次ogg extract抽取进程异常abending问题处理
环境版本:aix5309 oracle10.2.04 ogg11.2.1
extract抽取进程报错,进程running一会后自动ABENDING,进程rba号没变化。
错误日志:
2012-08-12 08:09:04 ERROR OGG-00446 error 22 (Invalid argument) opening redo log /bmsdblog/bmsdb/redo05.log for sequence 267279Not able to establish initial positi
on for begin time 2012-08-12 08:07:52.
2012-08-12 08:09:04 ERROR OGG-01668 PROCESS ABENDING.
mgr进程配置:
GGSCI (lpar6) 5> view params mgr
PORT 7839
DYNAMICPORTLIST 7840-7914
extract抽取进程配置:
GGSCI (lpar6) 3> view params bmsext1
EXTRACT bmsext1
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
USERID goldengate@bmsdb, PASSWORD goldengate
DISCARDFILE ./dirrpt/bmsext1.dsc, APPEND,MEGABYTES 1024
EXTTRAIL /ogg/bmsque/lt
obey ./dirsql/bms_table.txt
问题分析:
根据错误日志,判断是在读取/bmsdblog/bmsdb/redo05.log时发生错误,无法建立初始抽取点位置,导致抽取进程ABENDING。检查了ogg用户的属组,/bmsdblog/bmsdb/redo05.log的读写权限等,读取权限方面正常。
$ id
uid=1100(ogg) gid=500(dba) groups=1(staff)
$ ls -al /bmsdblog/bmsdb/redo05.log
-rwxr-xr-x 1 oracle dba 314573312 Aug 12 08:40 /bmsdblog/bmsdb/redo05.log
$ ls -al /bmsdblog/bmsdb
total 3072045
drwxr-xr-x 2 oracle dba 256 Jul 11 2011 .
drwxr-xr-x 4 oracle dba 256 Sep 25 2009 ..
-rwxr-xr-x 1 oracle dba 314573312 Aug 12 10:24 redo01.log
-rwxr-xr-x 1 oracle dba 314573312 Aug 12 11:04 redo02.log
-rwxr-xr-x 1 oracle dba 314573312 Aug 12 06:05 redo03.log
-rwxr-xr-x 1 oracle dba 314573312 Aug 12 06:05 redo04.log
-rwxr-xr-x 1 oracle dba 314573312 Aug 12 08:40 redo05.log
$ ls -al /bmsdblog
total 8
drwxr-xr-x 4 oracle dba 256 Sep 25 2009 .
drwxr-xr-x 33 root system 4096 Aug 09 04:09 ..
drwxr-xr-x 2 oracle dba 256 Jul 11 2011 bmsdb
drwxr-xr-x 2 oracle dba 256 Sep 11 2009 lost+found
查看数据库用户goldengate权限:
SQL> select GRANTEE,GRANTED_ROLE from dba_role_privs where GRANTEE='GOLDENGATE';
GRANTEE GRANTED_ROLE
------------------------------ ------------------------------
GOLDENGATE DBA
GOLDENGATE CONNECT
GOLDENGATE RESOURCE
查看OGG_Error_Messages,根据错误号未发现有用的信息,接着做了以下测试:
尝试使用ogg用户登录,使用命令more /bmsdblog/bmsdb/redo05.log查看结果,发现返回错误如下:
$ more /bmsdblog/bmsdb/redo05.log
/bmsdblog/bmsdb/redo05.log: A system call received a parameter that is not valid.
$ id
uid=1100(ogg) gid=500(dba) groups=1(staff)
切换到oracle用户登录,使用命令more /bmsdblog/bmsdb/redo05.log,发现结果相同:
$ su - oracle
oracle's Password:
$ more /bmsdblog/bmsdb/redo05.log
/bmsdblog/bmsdb/redo05.log: A system call received a parameter that is not valid.
$ id
uid=500(oracle) gid=500(dba) groups=1(staff)
但此刻oracle数据库对日志读写均是正常,也能正常切换日志和归档
$ sqlplus /nolog
SQL*Plus: Release 10.2.0.4.0 - Production on Sun Aug 12 11:08:45 2012
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
SQL> conn /as sysdba
Connected.
SQL> alter system switch logfile;
System altered.
网上查阅相关资料后,发现oracle在aix5L,oracle 10.2.03存在此问题,Doc ID: 428024.1
FILESYSTEMIO_OPTIONS=SETALL or DIRECTIO时,JFS2 file systems 没有使用cio mode挂载,导致External utilities or tools cannot read datafiles on JFS2 file systems
查看数据库的FILESYSTEMIO_OPTIONS设置:
SQL> show parameter FILESYSTEMIO_OPTIONS
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options string setall
查看mount属性
# mount
node mounted mounted over vfs date options
-------- --------------- --------------- ------ ------------ ---------------
/dev/hd4 / jfs2 Feb 07 20:03 rw,log=/dev/hd8
/dev/hd2 /usr jfs2 Feb 07 20:03 rw,log=/dev/hd8
/dev/hd9var /var jfs2 Feb 07 20:03 rw,log=/dev/hd8
/dev/hd3 /tmp jfs2 Feb 07 20:03 rw,log=/dev/hd8
/dev/hd1 /home jfs2 Feb 07 20:04 rw,log=/dev/hd8
/proc /proc procfs Feb 07 20:04 rw
/dev/hd10opt /opt jfs2 Feb 07 20:04 rw,log=/dev/hd8
/dev/lv_bmsdblog /bmsdblog jfs2 Feb 07 20:04 rw,log=/dev/loglv01
/dev/lv_bmsdbarc /oraarch1 jfs2 Feb 07 20:04 rw,log=/dev/loglv01
/dev/lv_bmsdbdata /oradb jfs2 Feb 07 20:04 rw,log=/dev/loglv01
/dev/lv_ora /oracle jfs2 Feb 07 20:04 rw,log=/dev/hd8
/dev/lv_soft /soft jfs2 Feb 07 20:04 rw,log=/dev/hd8
/dev/lf_bms_bak /nbubmsbak jfs2 Feb 07 20:04 rw,log=/dev/loglv02
/dev/lf_ogg /ogg jfs2 Aug 09 01:19 rw,log=/dev/loglv02
解决办法:
1. Enable CIO on JFS2 file system
OR
2. Change FILESYSTEMIO_OPTIONS=asynch
我的oracle版本是10.2.04,很奇怪是不是也存在这个问题?需要明天停库把参数改为asynch验证一下。 Good post, 加精 已对生产库进行修改后验证,问题得到解决 不错。不错。
页:
[1]