关于ASLR导致的僵尸进程原理疑问
本帖最后由 Hank-D 于 2014-2-10 14:20 编辑数据库:ORACLE 11.2.0.3.0
系统:RHEL 5.8 x86-64
刘大好,今天遇到了一个僵尸进程的问题
定位到了一个文档ORA-00445: Background Process "xxxx" Did Not Start After 120 Seconds (文档 ID 1345364.1)
原因是和这个RHEL的特性有冲突
CAUSE
Recent linux kernels have a feature called Address Space Layout Randomization (ASLR).
ASLR is a feature that is activated by default on some of the newer linux distributions.
It is designed to load shared memory objects in random addresses.
In Oracle, multiple processes map a shared memory object at the same address across the processes.
With ASLR turned on Oracle cannot guarantee the availability of this shared memory address.
This conflict in the address space means that a process trying to attach a shared memory object to a specific address may not be able to do so, resulting in a failure in shmat subroutine.
However, on subsequent retry (using a new process) the shared memory attachment may work.
The result is a "random" set of failures in the alert log.
SOLUTION
It should be noted that this problem has only been positively diagnosed in Redhat 5 and Oracle 11.2.0.2.
It is also likely, as per unpublished BUG:8527473, that this issue will reproduce running on Generic Linux platforms running any Oracle 11.2.0.x. or 12.1.0.x on Redhat/OEL kernels which have ASLR.
This issue has been seen in both Single Instance and RAC environments.
ASLR also exists in SLES10 and SLES 11 kernels and by default ASLR is turned on. To date no problem has been seen on SuSE servers running Oracle but Novell confirm ASLR may cause problems. Please refer to
http://www.novell.com/support/kb/doc.php?id=7004855 mmap occasionally infringes on stack
You can verify whether ASLR is being used as follows:
# /sbin/sysctl -a | grep randomize
kernel.randomize_va_space = 1
If the parameter is set to any value other than 0 then ASLR is in use.
On Redhat 5 to permanently disable ASLR.
add/modify this parameter in /etc/sysctl.conf
kernel.randomize_va_space=0
kernel.exec-shield=0
You need to reboot for kernel.exec-shield parameter to take effect.
Note that both kernel parameters are required for ASLR to be switched off.
There may be other reasons for a process failing to start, however, by switching ASLR off, you can quickly discount ASLR being the problem. More and more issues are being identified when ASLR is in operation.
通过设置这两个内核参数,并将系统重启后,问题消失
疑问在于,我的其他系统,这个特性也是正常设置的,只有这个系统出现了这个问题,对这个问题出现的真实原因不清晰,是否因为 对instance分配的内存不够导致的?
我以后安装一个新库的时候,是否需要 把这个特性给关闭呢?
感谢耐心浏览,谢谢回答
关注~
检查所有启用ASLR的服务器未发现有触发此缺陷迹象
ASLR是个安全特性,IBM DB2 10.1 Fixpack 3之前也同样建议关闭,关闭理由同Oracle. clevernby 发表于 2014-2-10 14:16 static/image/common/back.gif
关注~
检查所有启用ASLR的服务器未发现有触发此缺陷迹象
ASLR是个安全特性,IBM DB2 10.1 Fixpack 3之前也 ...
找ASLR的资料 看了半天 云里雾里 :(
页:
[1]