Oracle数据库数据恢复、性能优化»论坛 › Oracle › Oracle数据库管理 › Oracle 数据集进程详解

多多

6 积分	0 好友	0 主题

发消息

Oracle 数据集进程详解

1^#

发表于 2012-5-14 23:50:38 | 查看: 3622| 回复: 1

请问有Oracle数据库各个进程的作用和负责的工作的详解资料吗？还有就我们怎么分析某个进程已经死掉了？

分享0

收藏0 回复只看该作者道具举报

Maclean Liu(刘相兵

2^#

发表于 2012-5-15 12:36:08

请参考以下对各后台进程backgroud process的描述， FROM toms kyte Expert one ON ONE

SMON ‐ The System Monitor
SMON is the process that gets to do all of the jobs no one else wants to do. It is a sort of
?garbage collector? for the database. Some of the jobs it is responsible for include:
? Temporary space clean up ‐ With the advent of ?true? temporary tablespaces, this
job has lessened, but has not gone away. For example, when building an index, the
extents allocated for the index during the creation are marked as TEMPORARY. If
the CREATE INDEX session is aborted for some reason, SMON is responsible for
cleaning them up. There are other operations that create temporary extents that
SMON would be responsible for as well.

? Crash recovery ‐ SMON is responsible for performing crash recovery of a failed
instance, upon restart.

? Coalescing free space ‐ If you are using dictionary‐managed tablespaces, SMON is
responsible for taking extents that are free in a tablespace and contiguous with
respect to each other, and coalescing them into one ?bigger? free extent. This occurs
only on dictionary managed tablespace with a default storage clause that has
pctincrease set to a non‐zero value.

? Recovering transactions active against unavailable files ‐ This is similar to its role
during database startup. Here SMON recovers failed transactions that were
skipped during instance/crash recovery due to a file(s) not being available to
recover. For example, the file may have been on a disk that was unavailable or not
mounted. When the file does become available, SMON will recover it.

? Instance recovery of failed node in OPS ‐ In an Oracle Parallel Server
configuration, when a node in the cluster goes down (the machine fails), some other
node in the instance will open that failed node?s redo log files, and perform a
recovery of all data for that failed node.
? Cleans up OBJ$ ‐ OBJ$ is a low‐level data dictionary table that contains an entry for
almost every object (table, index, trigger, view, and so on) in the database. There are
many times entries in here that represent deleted objects, or objects that represent
?not there? objects, used in Oracle?s dependency mechanism. SMON is the process
that removes these no longer needed rows.
? Shrinks rollback segments ‐ SMON is the process that will perform the automatic
shrinking of a rollback segment to its optimal size, if it is set.
? ?Offlines? rollback segments ‐ It is possible for the DBA to ?offline? or make
unavailable, a rollback segment that has active transactions. It may be possible that
active transactions are using this off lined rollback segment. In this case, the
rollback is not really off lined; it is marked as ?pending offline?. SMON will
periodically try to ?really? offline it in the background until it can.
That should give you a flavor of what SMON does. As evidenced by the ps listing of
processes I introduced above, SMON can accumulate quite a lot of CPU over time (the
instance from which ps was taken was an active instance that had been up for well over a
month). SMON periodically wakes up (or is woken up by the other backgrounds) to
perform these housekeeping chores.

RECO ‐ Distributed Database Recovery
RECO has a very focused job; it recovers transactions that are left in a prepared state
because of a crash or loss of connection during a two‐phase commit (2PC). A 2PC is a
distributed protocol that allows for a modification that affects many disparate databases to
be committed atomically. It attempts to close the window for distributed failure as much
as possible before committing. In a 2PC between N databases, one of the databases,
typically (but not always) the one the client logged in to initially, will be the coordinator.
This one site will ask the other N‐1 sites if they are ready to commit. In effect, this one site
will go to the N‐1 sites, and ask them to be prepared to commit. Each of the N‐1 sites
reports back their ?prepared state? as YES or NO. If any one of the sites votes NO, the entire
transaction is rolled back. If all sites vote YES, then the site coordinator broadcasts a
message to make the commit permanent on each of the N‐1 sites.
If after some site votes YES, they are prepared to commit, but before they get the directive
from the coordinator to actually commit, the network fails or some other error occurs, the
transaction becomes an in‐doubt distributed transaction. The 2PC tries to limit the
window of time in which this can occur, but cannot remove it. If we have a failure right
then and there, the transaction will become the responsibility of RECO. RECO will try to
contact the coordinator of the transaction to discover its outcome. Until it does that, the
transaction will remain in its uncommitted state. When the transaction coordinator can be
reached again, RECO will either commit the transaction or roll it back.
It should be noted that if the outage is to persist for an extended period of time, and you
have some outstanding transactions, you can commit/roll them back manually yourself.
You might want to do this since an in doubt distributed transaction can cause writers to
block readers ‐ this is the one time this can happen in Oracle. Your DBA could call the
DBA of the other database and ask them to query up the status of those in‐doubt
transactions. Your DBA can then commit or roll them back, relieving RECO of this task.

CKPT ‐ Checkpoint Process
The checkpoint process doesn?t, as its name implies, do a checkpoint (that?s mostly the job
of DBWn). It simply assists with the checkpointing process by updating the file headers of
the data files. It used to be that CKPT was an optional process, but starting with version
8.0 of the database it is always started, so if you do a ps on UNIX, you?ll always see it there.
The job of updating data files? headers with checkpoint information used to belong to the
LGWR (Log Writer) however, as the number of files increased along with the size of a
database over time, this additional task for LGWR became too much of a burden. If LGWR
had to update dozens, or hundreds, or even thousands of files, there would be a good
chance sessions waiting to commit these transactions would have to wait far too long.
CKPT removes this responsibility from LGWR.

DBWn ‐ Database Block Writer
The Database Block Writer (DBWn) is the background process responsible for writing
dirty blocks to disk. DBWn will write dirty blocks from the buffer cache, usually in order
to make more room in the cache (to free buffers for reads of other data), or to advance a
checkpoint (to move forward the position in an online redo log file from which Oracle
would have to start reading, in order to recover the instance in the event of failure). As we
discussed previously, when Oracle switches log files, a checkpoint is signaled. Oracle
needs to advance the checkpoint so that it no longer needs the online redo log file it just
filled up. If it hasn?t been able to do that by the time we need to reuse that redo log file, we
get the ?checkpoint not complete? message and we must wait.
As you can see, the performance of DBWn can be crucial. If it does not write out blocks
fast enough to free buffers for us, we will see waits for FREE_BUFFER_WAITS and ?Write
Complete Waits? start to grow.
We can configure more then one DBWn, up to ten in fact (DBW0 ... DBW9). Most systems
run with one database block writer but larger, multi‐CPU systems can make use of more
than one. If you do configure more then one DBWn, be sure to also increase the init.ora
parameter, DB_BLOCK_LRU_LATCHES. This controls the number of LRU list latches
(now called touch lists in 8i) ‐ in effect, you want each DBWn to have their own list. If each
DBWn shares the same list of blocks to write out to disk then they would only end up
contending with other in order to access this list.
Normally, the DBWn uses asynchronous I/O to write blocks to disk. With asynchronous
I/O, DBWn gathers up a batch of blocks to be written, and gives them to the operating
system. DBWn does not wait for the OS to actually write the blocks out, rather it goes back
and collects the next batch to be written. As the OS completes the writes, it
asynchronously notifies DBWn that it completed the write. This allows DBWn to work
much faster than if it had to do everything serially. We?ll see later, in the Slave Processes
section, how we can use I/O slaves to simulate asynchronous I/O on platforms or
configurations that do not support it.
I would like to make one final point about DBWn. It will, almost by definition, write out
blocks scattered all over disk ‐ DBWn does lots of scattered writes. When you do an
update, you?ll be modifying index blocks that are stored here and there and data blocks
that are randomly distributed on disk as well. LGWR, on the other hand, does lots of
sequential writes to the redo log. This is an important distinction, and one of the reasons
that Oracle has a redo log and a LGWR process. Scattered writes are significantly slower
then sequential writes. By having the SGA buffer dirty blocks and the LGWR process do
large sequential writes that can recreate these dirty buffers, we achieve an increase in
performance. The fact that DBWn does its slow job in the background while LGWR does
its faster job while the user waits, gives us overall better performance. This is true even
though Oracle may technically be doing more I/O then it needs to (writes to the log and to
the datafile) ‐ the writes to the online redo log could be skipped if, during a commit,
Oracle physically wrote the modified blocks out to disk instead.

LGWR ‐ Log Writer
The LGWR process is responsible for flushing to disk the contents of the redo log buffer,
located in the SGA. It does this:
? Every three seconds, or
? Whenever you commit, or
? When the redo log buffer is a third full or contains 1 MB of buffered data.
For these reasons, having an enormous redo log buffer is not practical ‐ Oracle will never
be able to use it all. The logs are written to with sequential writes as compared to the
scattered I/O DBWn must perform. Doing large batch writes like this is much more
efficient than doing many scattered writes to various parts of a file. This is one of the main
reasons for having a LGWR and redo logs in the first place. The efficiency in just writing
out the changed bytes using sequential I/O outweighs the additional I/O incurred. Oracle
could just write database blocks directly to disk when you commit but that would entail a
lot of scattered I/O of full blocks ‐ this would be significantly slower than letting LGWR
write the changes out sequentially.

ARCn ‐ Archive Process
The job of the ARCn process is to copy an online redo log file to another location when
LGWR fills it up. These archived redo log files can then be used to perform media
recovery. Whereas online redo log is used to ?fix? the data files in the event of a power
failure (when the instance is terminated), archive redo logs are used to ?fix? data files in the
event of a hard disk failure. If you lose the disk drive containing the data file,
/d01/oradata/ora8i/system.dbf, we can go to our backups from last week, restore that old
copy of the file, and ask the database to apply all of the archived and online redo log
generated since that backup took place. This will ?catch up? that file with the rest of the
data files in our database, and we can continue processing with no loss of data.
ARCn typically copies online redo log files to at least two other locations (redundancy
being a key to not losing data!). These other locations may be disks on the local machine or,
more appropriately, at least one will be located on another machine altogether, in the
event of a catastrophic failure. In many cases, these archived redo log files are copied off
by some other process to some tertiary storage device, such as tape. They may also be sent
to another machine to be applied to a ?standby database?, a failover option offered by
Oracle.

BSP ‐ Block Server Process
This process is used exclusively in an Oracle Parallel Server (OPS) environment. An OPS is
a configuration of Oracle whereby more then one instance mounts and opens the same
database. Each instance of Oracle in this case is running on a different machine in a cluster,
and they all access in a read‐write fashion the same exact set of database files.
In order to achieve this, the SGA block buffer caches must be kept consistent with respect
to each other. This is the main goal of the BSP. In earlier releases of OPS this was
accomplished via a ?ping?. That is, if a node in the cluster needed a read consistent view of
a block that was locked in exclusive mode by another node, the exchange of data was done
via a disk flush (the block was pinged). This was a very expensive operation just to read
data. Now, with the BSP, this exchange is done via very fast cache‐to‐cache exchange over
the clusters high‐speed connection.

LMON ‐ Lock Monitor Process
This process is used exclusively in an OPS environment. The LMON process monitors all
instances in a cluster to detect the failure of an instance. It then facilitates the recovery of
the global locks held by the failed instance, in conjunction with the distributed lock
manager (DLM) employed by the cluster hardware.

LMD ‐ Lock Manager Daemon
This process is used exclusively in an OPS environment. The LMD process controls the
global locks and global resources for the block buffer cache in a clustered environment.
Other instances will make requests to the local LMD, in order to request it to release a lock
or help find out who has a lock. The LMD also handles global deadlock detection and
resolution.

LCKn ‐ Lock Process
The LCKn process is used exclusively in an OPS environment. This process is very similar
in functionality to the LMD described above, but handles requests for all global resources
other than database block buffers.

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册