Oracle Exadata Assessment Report

System Health Score is 85 out of 100 (detail)

Cluster Summary

Cluster Namedm01-cluster
OS Version LINUX X86-64 OELRHEL 5 2.6.32-400.1.1.el5uek
CRS Home - Version/u01/app/11.2.0.3/grid - 11.2.0.3.0
DB Home - Version - Names/u01/app/oracle/product/11.2.0.3/dbhome_1 - 11.2.0.3.0 - MCSDB
Exadata Version11.2.3.2.0
Number of nodes7
   Database Servers2
   Storage Servers3
   IB Switches2
exachk Version 2.1.5_20120524
Collectionexachk_MCSDB_041614_105402.zip
Collection Date16-Apr-2014 11:03:16

Table of Contents


Remove finding from report

Findings Needing Attention

FAIL, WARNING, ERROR and INFO findings should be evaluated. INFO status is considered a significant finding and details for those should be reviewed in light of your environment.

Database Server

Status Type Message Status On Details
FAILOS CheckDatabase control files are not configured as recommendedAll Database ServersView
FAILPatch CheckSystem may be exposed to Exadata Critical Issue DB11All HomesView
FAILOS CheckDatabase Server Physical Drive Configuration does not meet recommendationAll Database ServersView
FAILSQL Parameter CheckDatabase parameter USE_LARGE_PAGES is NOT set to recommended valueAll InstancesView
FAILSQL Parameter CheckDatabase parameter GLOBAL_NAMES is NOT set to recommended valueAll InstancesView
FAILOS CheckInfiniBand network error counters are non-zeroAll Database ServersView
FAILSQL CheckSome data or temp files are not autoextensibleAll DatabasesView
FAILSQL Parameter CheckDatabase parameter _lm_rcvr_hang_allow_time is NOT set to the recommended valueAll InstancesView
FAILSQL Parameter CheckDatabase parameter _kill_diagnostics_timeout is not set to recommended valueAll InstancesView
WARNINGOS CheckAll voting disks are not onlineAll Database ServersView
WARNINGSQL CheckSome tablespaces are not using Automatic segment storage management.All DatabasesView
INFOOS CheckASM griddisk,diskgroup and Failure group mapping not checked.All Database ServersView

Storage Server

Status Type Message Status On Details
FAILStorage Server CheckThe griddisk ASM status should match specificationdm01cel01View
FAILStorage Server CheckThe celldisk configuration on disk drives should match Oracle best practicesdm01cel01View
FAILStorage Server Checkone or storage server has open critical alerts.All Storage ServersView
FAILStorage Server CheckStorage Server alerts are not configured to be sent via emailAll Storage ServersView
WARNINGStorage Server CheckFree space in root(/) filesystem is less than recommended on one or more storage servers.All Storage ServersView

Top

MAA Scorecard

Outage Type Status Type Message Status On Details
COMPUTER FAILURE PREVENTION BEST PRACTICESPASS
Description
Oracle RAC and Oracle Clusterware allow Oracle Database to run any packaged or custom application across a set of clustered servers. This capability provides the highest levels of availability and the most flexible scalability. If a clustered server fails, then Oracle Database continues running on the surviving servers. When more processing power is needed, you can add another server without interrupting access to data. For RAC and clusterware MAA best practices please consult chapter 6 and 7 of the HA Best Practices guide
Links:
PASSSQL Parameter Checkfast_start_mttr_target has been changed from defaultAll InstancesView
STORAGE FAILURES PREVENTION BEST PRACTICESPASS
Description
The Oracle Storage Grid is implemented using either Oracle Automatic Storage Management (ASM) and Oracle Exadata Storage Server Software or ASM and third-party storage. The Oracle Storage Grid with Exadata seamlessly supports MAA-related technology, improves performance, provides unlimited I/O scalability, is easy to use and manage, and delivers mission-critical availability and reliability to your enterprise. For additional information and best practices consult the Oracle Exadata Storage Server Web site at http://www.oracle.com/exadata
Links:
PASSSQL CheckAt least one high redundancy diskgroup configuredAll DatabasesView
DATA CORRUPTION PREVENTION BEST PRACTICESFAIL
Description
The MAA recommended way to achieve the most comprehensive data corruption prevention and detection is to use Oracle Data Guard and configure the DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_PROTECT database initialization parameters on the Data Guard primary and standby databases.
Links:
FAILSQL Parameter CheckDatabase parameter DB_BLOCK_CHECKSUM is NOT set to recommended valueAll InstancesView
WARNINGOS CheckDatabase parameter DB_BLOCK_CHECKING is NOT set to the recommended value.All Database ServersView
WARNINGOS CheckDatabase parameter DB_BLOCK_CHECKING is NOT set to the recommended value.All Database ServersView
PASSSQL Parameter CheckDatabase parameter DB_LOST_WRITE_PROTECT is set to recommended valueAll InstancesView
PASSOS CheckShell limit soft nofile for DB is configured according to recommendationAll Database ServersView
LOGICAL CORRUPTION PREVENTION BEST PRACTICESFAIL
Description
Oracle Flashback Technology enables fast logical failure repair. Oracle recommends that you use automatic undo management with sufficient space to attain your desired undo retention guarantee, enable Oracle Flashback Database, and allocate sufficient space and I/O bandwidth in the fast recovery area. Application monitoring is required for early detection. Effective and fast repair comes from leveraging and rehearsing the most common application specific logical failures and using the different flashback features effectively (e.g flashback query, flashback version query,flashback transaction query, flashback transaction, flashback drop, flashback table, and flashback database)
Links:
FAILSQL CheckFlashback is not configuredAll DatabasesView
PASSSQL Parameter CheckDatabase parameter UNDO_RETENTION is not nullAll InstancesView
DATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICESFAIL
Description
Oracle Data Guard is a high availability and disaster-recovery solution that provides very fast automatic failover (referred to as fast-start failover) in database failures, node failures, corruption, and media failures. Furthermore, the standby databases can be used for read-only access and subsequently for reader farms, for reporting, for backups and for testing and development. For zero data loss protection and fastest recovery time, deploy a local Data Guard standby database with Data Guard Fast-Start Failover. For protection against outages impacting both the primary and the local standby, deploy a second Data Guard standby database at a remote location.
Links:
FAILOS CheckOracle Net service name to ship redo to the standby is not configured properlyAll Database ServersView
FAILSQL CheckRemote destination not is using either ASYNC or SYNC transport for redo transportAll DatabasesView
FAILSQL CheckStandby is not running in MANAGED REAL TIME APPLY modeAll DatabasesView
FAILSQL CheckStandby redo logs are not configured on both sitesAll DatabasesView
FAILSQL CheckPhysical standby status is not validAll DatabasesView
WARNINGSQL CheckLogical standby unsupported datatypes foundAll DatabasesView
PASSSQL CheckDatabase parameter LOG_FILE_NAME_CONVERT or DB_CREATE_ONLINE_LOG_DEST_1 is not nullAll DatabasesView
NETWORK FAILURE PREVENTION BEST PRACTICESINFO
Description
Redundant InfiniBand network connectivity using dual-ported Quad Data Rate (QDR) Host Channel Adapters (HCAs) and redundant switches are pre-configured. Configuring the same redundancy for client access by network bonding is recommended and can be done at deployment time. Additional servers such as media and ETL servers should also be configured with redundant networks.
Links:
CLIENT FAILOVER OPERATIONAL BEST PRACTICESFAIL
Description
A highly available architecture requires the ability of the application tier to transparently fail over to a surviving instance or database advertising the required service. This ensures that applications are generally available or minimally impacted in the event of node failure, instance failure, data corruption, or database failures. For client failover best practices and network best practices, see the MAA white paper "Client Failover Best Practices for Data Guard 11g Release 2" from the MAA Best Practices area for Oracle Database.
Links:
FAILOS CheckDataguard broker configuration does not existAll Database ServersView
PASSOS CheckClusterware is runningAll Database ServersView
OPERATIONAL BEST PRACTICESINFO
Description
The integration of Oracle Maximum Availability Architecture (MAA) operational and configuration best practices with Oracle Exadata Database Machine (Exadata MAA) provides the most comprehensive high availability solution available for the Oracle Database
Links:
CONSOLIDATION DATABASE PRACTICESINFO
Description
Managing multiple databases on Exadata require following additional guidelines
Links:

Top

Findings Passed

Database Server

Status Type Message Status On Details
PASSASM CheckASM processes parameter is set to recommended valueAll ASM InstancesView
PASSSQL Parameter CheckRECYCLEBIN is set to the recommended valueAll InstancesView
PASSSQL Parameter CheckASM parameter ASM_POWER_LIMIT is set to the default value.All InstancesView
PASSOS CheckDNS Server ping time is in acceptable rangeAll Database ServersView
PASSOS CheckDatabase Server Disk Controller Configuration meets recommendationAll Database ServersView
PASSSQL Parameter CheckASM parameter MEMORY_MAX_TARGET is set according to recommended valueAll InstancesView
PASSSQL Parameter CheckASM parameter PGA_AGGREGATE_TARGET is set according to recommended valueAll InstancesView
PASSSQL Parameter CheckASM parameter MEMORY_TARGET is set according to recommended valueAll InstancesView
PASSSQL Parameter CheckASM parameter SGA_TARGET is set according to recommended value.All InstancesView
PASSSQL CheckAll bigfile tablespaces have non-default maxbytes values setAll DatabasesView
PASSOS Checksubnet manager is running on an InfiniBand switchAll Database ServersView
PASSOS CheckAddress Resolution Protocol (ARP) is configured properly on database server.All Database ServersView
PASSOS CheckOnly one non-ASM instance discoveredAll Database ServersView
PASSOS CheckDatabase parameter Db_create_online_log_dest_n is set to recommended valueAll Database ServersView
PASSOS CheckDatabase parameters log_archive_dest_n with Location attribute are all set to recommended valueAll Database ServersView
PASSOS CheckDatabase parameter COMPATIBLE is set to recommended valueAll Database ServersView
PASSOS CheckAll Ethernet network cables are connectedAll Database ServersView
PASSOS CheckAll InfiniBand network cables are connectedAll Database ServersView
PASSOS CheckDatabase parameter db_recovery_file_dest_size is set to recommended valueAll Database ServersView
PASSOS CheckDatabase DB_CREATE_FILE_DEST and DB_RECOVERY_FILE_DEST are in different diskgroupsAll Database ServersView
PASSOS CheckDatabase parameter CLUSTER_INTERCONNECTS is set to the recommended valueAll Database ServersView
PASSASM CheckASM parameter CLUSTER_INTERCONNECTS is set to the recommended valueAll ASM InstancesView
PASSSQL Parameter CheckDatabase parameter PARALLEL_EXECUTION_MESSAGE_SIZE is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter SQL92_SECURITY is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter OPEN_CURSORS is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter OS_AUTHENT_PREFIX is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter PARALLEL_THREADS_PER_CPU is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter _ENABLE_NUMA_SUPPORT is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter PARALLEL_ADAPTIVE_MULTI_USER is set to recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter _file_size_increase_increment is set to the recommended valueAll InstancesView
PASSSQL Parameter CheckDatabase parameter LOG_BUFFER is set to recommended valueAll InstancesView
PASSOS CheckDisk cache policy is set to Disabled on database serverAll Database ServersView
PASSOS CheckExadata software version supports Automatic Service Request functionalityAll Database ServersView
PASSOS CheckOracle ASM Communication is using RDS protocol on Infiniband NetworkAll Database ServersView
PASSOS CheckDatabase server disk controllers use writeback cacheAll Database ServersView
PASSOS CheckVerify-topology executes without any errors or warningAll Database ServersView
PASSOS CheckHardware and firmware profile check is successful. [Database Server]All Database ServersView
PASSOS CheckLocal listener init parameter is set to local node VIPAll Database ServersView
PASSOS Checkohasd/orarootagent_root Log Ownership is Correct (root root)All Database ServersView
PASSOS CheckRemote listener is set to SCAN nameAll Database ServersView
PASSOS CheckNTP is running with correct settingAll Database ServersView
PASSOS CheckInterconnect is configured on non-routable network addressesAll Database ServersView
PASSSQL CheckSYS.IDGEN1$ sequence cache size >= 1,000All DatabasesView
PASSOS Checkcrsd Log Ownership is Correct (root root)All Database ServersView
PASSOS CheckOSWatcher is runningAll Database ServersView
PASSOS CheckORA_CRS_HOME environment variable is not setAll Database ServersView
PASSSQL CheckGC blocks lost is not occurringAll DatabasesView
PASSOS CheckNIC bonding mode is not set to Broadcast(3) for cluster interconnectAll Database ServersView
PASSSQL CheckSYS.AUDSES$ sequence cache size >= 10,000All DatabasesView
PASSOS CheckSELinux is not being Enforced.All Database ServersView
PASSOS Checkcrsd/orarootagent_root Log Ownership is Correct (root root)All Database ServersView
PASSOS Check$ORACLE_HOME/bin/oradism ownership is rootAll Database ServersView
PASSOS CheckNIC bonding is configured for interconnectAll Database ServersView
PASSOS Check$ORACLE_HOME/bin/oradism setuid bit is setAll Database ServersView
PASSOS Checkohasd Log Ownership is Correct (root root)All Database ServersView
PASSOS CheckNIC bonding is configured for public network (VIP)All Database ServersView
PASSOS CheckOpen files limit (ulimit -n) for current user is set to recommended value >= 65536 or unlimitedAll Database ServersView
PASSOS CheckNIC bonding mode is not set to Broadcast(3) for public networkAll Database ServersView
PASSOS CheckShell limit hard stack for GI is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit soft nproc for DB is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit hard stack for DB is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit hard nofile for DB is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit hard nproc for DB is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit hard nproc for GI is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit hard nofile for GI is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit soft nproc for GI is configured according to recommendationAll Database ServersView
PASSOS CheckShell limit soft nofile for GI is configured according to recommendationAll Database ServersView
PASSOS CheckManagement network is separate from data networkAll Database ServersView
PASSOS CheckRAID controller battery temperature is normal [Database Server]All Database ServersView
PASSASM CheckASM Audit file destination file count <= 100,000All ASM InstancesView
PASSOS CheckDatabase Server Virtual Drive Configuration meets recommendationAll Database ServersView
PASSASM CheckCorrect number of FailGroups per ASM DiskGroup are configuredAll ASM InstancesView
PASSOS CheckSystem model number is correctAll Database ServersView
PASSOS CheckNumber of Mounts before a File System check is set to -1 for system diskAll Database ServersView
PASSOS CheckFree space in root(/) filesystem meets or exceeds recommendation.All Database ServersView
PASSOS CheckOracle RAC Communication is using RDS protocol on Infiniband NetworkAll Database ServersView
PASSOS CheckDatabase Home is properly linked with RDS libraryAll Database ServersView
PASSOS CheckInfiniBand is the Private Network for Oracle Clusterware CommunicationAll Database ServersView
PASSSQL CheckRDBMS Version is 11.2.0.2 or higher as expectedAll DatabasesView
PASSSQL CheckAll tablespaces are locally manged tablespaceAll DatabasesView
PASSASM CheckASM Version is 11.2.0.2 or higher as expectedAll ASM InstancesView
PASSOS CheckNUMA is OFF at operating system level.All Database ServersView
PASSOS CheckDatabase Server InfiniBand network MTU size is 65520All Database ServersView
PASSOS CheckClusterware Home is properly linked with RDS libraryAll Database ServersView
PASSOS CheckCSS misscount is set to the recommended value of 60All Database ServersView
PASSOS CheckDatabase server InfiniBand network is in "connected" mode.All Database ServersView
PASSASM CheckAll disk groups have compatible.asm parameter set to recommended valuesAll ASM InstancesView
PASSASM CheckAll disk groups have CELL.SMART_SCAN_CAPABLE parameter set to trueAll ASM InstancesView
PASSASM CheckAll disk groups have compatible.rdbms parameter set to recommended valuesAll ASM InstancesView
PASSASM CheckAll disk groups have allocation unit size set to 4MBAll ASM InstancesView

Storage Server

Status Type Message Status On Details
PASSStorage Server CheckThe celldisk configuration on flash memory devices matches Oracle best practicesAll Storage ServersView
PASSStorage Server CheckThe griddisk count matches across all storage servers where a given prefix name existsAll Storage ServersView
PASSStorage Server CheckThe total number of griddisks with a given prefix name is evenly divisible by the number of celldisksAll Storage ServersView
PASSStorage Server CheckThe total size of all griddisks fully utilizes celldisk capacityAll Storage ServersView
PASSStorage Server CheckDNS Server ping time is in acceptable rangeAll Storage ServersView
PASSStorage Server CheckSmart flash log is created on all storage serverAll Storage ServersView
PASSStorage Server CheckStorage Server Flash Memory is configured as Exadata Smart Flash CacheAll Storage ServersView
PASSStorage Server CheckPeripheral component interconnect (PCI) bridge is configured for generation II on all storage serversAll Storage ServersView
PASSStorage Server CheckThere are no griddisks configured on flash memory devicesAll Storage ServersView
PASSStorage Server CheckNo Storage Server conventional or flash disks have a performance problemAll Storage ServersView
PASSStorage Server CheckAll InfiniBand network cables are connected on all Storage ServersAll Storage ServersView
PASSStorage Server CheckAll Ethernet network cables are connected on all Storage ServersAll Storage ServersView
PASSStorage Server CheckDisk cache policy is set to Disabled on all storage serverAll Storage ServersView
PASSStorage Server CheckElectronic Storage Module (ESM) Lifetime is within specification for all flash cards on all storage serversAll Storage ServersView
PASSStorage Server CheckManagement network is separate from data network on all storage serversAll Storage ServersView
PASSStorage Server CheckAmbient temperature is within the recommended range.All Storage ServersView
PASSStorage Server CheckSoftware profile check is successful on all storage servers.All Storage ServersView
PASSStorage Server CheckHardware and firmware profile check is successful on all storage servers.All Storage ServersView
PASSStorage Server CheckOSWatcher is running on all storage serversAll Storage ServersView
PASSStorage Server CheckRAID controller battery temperature is normal [Storage Server]All Storage ServersView
PASSStorage Server CheckAll Exadata storage server meet system model number requirementAll Storage ServersView
PASSStorage Server CheckAll storage server disk controllers use writeback cacheAll Storage ServersView
PASSStorage Server CheckNo celldisks have status of predictive failureAll Storage ServersView
PASSStorage Server CheckRAID controller version matches on all storage serversAll Storage ServersView
PASSStorage Server CheckNo Storage Server Memory (ECC) Errors found.All Storage ServersView

Cluster Wide

Status Type Message Status On Details
PASSCluster Wide CheckRDBMS home /u01/app/oracle/product/11.2.0.3/dbhome_1 has same number of patches installed across the clusterCluster Wide-
PASSCluster Wide CheckClusterware active version matches across cluster.Cluster WideView
PASSCluster Wide CheckGrid Infrastructure software owner UID matches across clusterCluster WideView
PASSCluster Wide CheckTimezone matches for current user across cluster.Cluster WideView
PASSCluster Wide CheckPrivate interconnect interface names are the same across clusterCluster WideView
PASSCluster Wide CheckRDBMS software version matches across cluster.Cluster WideView
PASSCluster Wide CheckPublic network interface names are the same across clusterCluster WideView
PASSCluster Wide CheckOS Kernel version(uname -r) matches across cluster.Cluster WideView
PASSCluster Wide CheckRDBMS software owner UID matches across clusterCluster WideView
PASSCluster Wide CheckTime zone matches for Grid Infrastructure software owner across clusterCluster WideView
PASSCluster Wide CheckTime zone matches for root user across clusterCluster WideView
PASSCluster Wide CheckMaster (Rack) Serial Number matches across database servers and storage serversCluster WideView

Top

Best Practices and Other Recommendations

Best Practices and Other Recommendations are generally items documented in various sources which could be overlooked. exachk assesses them and calls attention to any findings.


Top

Clusterware version comparison

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Stability, Availability, Standardization

Risk:

Potential cluster instability due to clusterware version mismatch on cluster nodes.
It is possible that if the clusterware versions do not match that some incompatibility
could exist which would make diagnosing problems difficult or bugs fixed in the
later clusterware version still being present on some nodes but not on others.

Action / Repair:

Unless in the process of a rolling upgrade of the clusterware it is assumed
that the clusterware versions will match across the cluster.  If they do not then it is
assumed that some mistake has been made and overlooked.  The purpose of
this check is to bring this situation to the attention of the customer for action and remedy.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Clusterware active version matches across cluster.


dm01db01 = 112030
dm01db02 = 112030
Top

Top

UID for GI software owner across cluster

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Availability, stability

Risk:

Potential OCR logical corruptions and permission problems accessing OCR keys when multiple O/S users share the same UID which are difficult to diagnose.

Action / Repair:

For GI and RDBMS software owners ensure one unique user ID with a single name is in use across the cluster.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Grid Infrastructure software owner UID matches across cluster


dm01db01 = 1000
dm01db02 = 1000
Top

Top

Timezone for current user

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Clusterware deployment requirement

Risk:

Potential cluster instability

Action / Repair:

Oracle Clusterware requires the same time zone setting on all cluster nodes. During installation, the installation process picks up the time zone setting of the Grid installation owner on the node where OUI runs, and uses that on all nodes as the default TZ setting for all processes managed by Oracle Clusterware. This default is used for databases, Oracle ASM, and any other managed processes.

If for whatever reason the time zones have gotten out of sych then the configuration should be corrected.
Consult with Oracle Support about the proper method for correcting the time zones.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Timezone matches for current user across cluster.


dm01db01 = CST
dm01db02 = CST
Top

Top

Grid Infrastructure - Private interconnect interface name check

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Stability, Availability, Standardization

Risk:

Potential cluster or application instability due to incorrectly named network interfaces.

Action / Repair:

The Oracle clusterware expects and it is required that the network interfaces used for
the cluster interconnect be named the same on all nodes of the cluster.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Private interconnect interface names are the same across cluster


dm01db01 = bondib0
dm01db02 = bondib0
Top

Top

RDBMS software version comparison

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Stability, Availability, Standardization

Risk:

Potential database or application instability due to version mismatch for database homes.
It is possible that if the versions of related RDBMS homes on all the cluster nodes do not
match that some incompatibility could exist which would make diagnosing problems difficult
or bugs fixed in the later RDBMS version still being present on some nodes but not on others.

Action / Repair:

It is assumed that the RDBMS versions of related database homes will match across the cluster. 
If the versions of related RDBMS homes do not match then it is assumed that some mistake has
been made and overlooked.  The purpose of this check is to bring this situation to the attention
of the customer for action and remedy.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => RDBMS software version matches across cluster.


dm01db01 = 112030
dm01db02 = 112030
Top

Top

Grid Infrastructure - Public interface name check (VIP)

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Stability, Availability, Standardization

Risk:

Potential application instability due to incorrectly named network interfaces used for node VIP.

Action / Repair:

The Oracle clusterware expects and it is required that the network interfaces used for
the public interface used for the node VIP be named the same on all nodes of the cluster.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Public network interface names are the same across cluster


dm01db01 = bondeth0
dm01db02 = bondeth0
Top

Top

Kernel version comparison across cluster

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Stability, Availability, Standardization

Risk:

Potential cluster instability due to kernel version mismatch on cluster nodes.
It is possible that if the kernel versions do not match that some incompatibility
could exist which would make diagnosing problems difficult or bugs fixed in the l
ater kernel still being present on some nodes but not on others.

Action / Repair:

Unless in the process of a rolling upgrade of cluster node kernels it is assumed
that the kernel versions will match across the cluster.  If they do not then it is
assumed that some mistake has been made and overlooked.  The purpose of
this check is to bring this situation to the attention of the customer for action and remedy.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => OS Kernel version(uname -r) matches across cluster.


dm01db01 = 2632-40011el5uek
dm01db02 = 2632-40011el5uek
Top

Top

RDBMS software owner UID across cluster

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Availability, stability

Risk:

Potential OCR logical corruptions and permission problems accessing OCR keys when multiple O/S users share the same UID which are difficult to diagnose.

Action / Repair:

For GI and Oracle software owners ensure one unique user ID with a single name is in use across the cluster.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => RDBMS software owner UID matches across cluster


dm01db01 = 1001
dm01db02 = 1001
Top

Top

Grid Infrastructure software owner time zone

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Clusterware deployment requirement

Risk:

Potential cluster instability

Action / Repair:

Oracle Clusterware requires the same time zone setting on all cluster nodes. During installation, the installation process picks up the time zone setting of the Grid installation owner on the node where OUI runs, and uses that on all nodes as the default TZ setting for all processes managed by Oracle Clusterware. This default is used for databases, Oracle ASM, and any other managed processes.

If for whatever reason the time zones have gotten out of sych then the configuration should be corrected.
Consult with Oracle Support about the proper method for correcting the time zones.
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Time zone matches for Grid Infrastructure software owner across cluster


dm01db01 = CST
dm01db02 = CST
Top

Top

Root time zone

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Time zone matches for root user across cluster


dm01db01 = CST
dm01db02 = CST
Top

Top

Verify Master (Rack) Serial Number is Set [Database Server and storage server]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Setting the Master Serial Number (MSN) (aka Rack Serial Number) assists Oracle Support Services to resolve entitlement issues which may arise. The MSN is listed on a label on the front and the rear of the chassis but is not electronically readable unless this value is set.

The impact to set the MSN is minimal.

Risk:

Not having the MSN set for the system may hinder entitlement when opening Service Requests.

Action / Repair:

Use the following command to verify that all the MSN's are set correctly and all match:

# ipmitool sunoem cli "show /SP system_identifier" 

The output should resemble one of the following:

EV2: Sun Oracle Database Machine xxxxAKyyyy

X2-2: Exadata Database Machine X2-2 xxxxAKyyyy

X2-8: Exadata Database Machine X2-8 xxxAKyyyy

(MSN's will be of the format either 4 numbers, the letters 'AK' followed by 4 more numbers or letters A-F, or the letters 'AK followed by 8 numbers or letters A-F)
On any server where the MSN is not set correctly, use the following command as the "root" userid to set it:
ipmitool sunoem cli 'set /SP system_identifier="text_identifier_string serial_number"'
Where "text_identifier_string" is one of:
For X2-2(4170): "Sun Oracle Database Machine"
For X2-2: "Exadata Database Machine X2-2"
For X2-8: "Exadata Database Machine X2-8"
and "serial_number" is the MSN from the label attached to the rack.
NOTE: The label with the Master Serial Number is located on the top left side wall (viewed from rear) inside the rack on the rear of the chassis.
NOTE: In the command to set the Master Serial Number there is a space between the "text_identifier_string" and the "serial_number".
 
Needs attention on-
Passed onCluster Wide

Status on Cluster Wide: PASS => Master (Rack) Serial Number matches across database servers and storage servers


dm01db01 = AK00030945
dm01cel01 = AK00030945
dm01cel02 = AK00030945
dm01cel03 = AK00030945
dm01db02 = AK00030945
Top

Top

High Redundancy Controlfile

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. 

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. 

Action / Repair: 

A high redundancy diskgroup optimizes availability.

1. Control file should be in a high redundancy disk group.
2. One controlfile members is recommended for high redundancy and two members in separate disk groups are recommended for normal redundancy 


Query to find out number of control files and respective disk group and redundancy:-
 
select substr(cf.name,1,instr(cf.name,'/',1,1) - 1) cf_name,dg.name dg_name,dg.type redundancy from v$controlfile cf,v$asm_diskgroup dg where replace(substr(cf.name,1,instr(cf.name,'/',1,1) - 1),'+') = dg.name;
 
Needs attention ondm01db01, dm01db02
Passed on-

Status on dm01db01: FAIL => Database control files are not configured as recommended


DATA FROM DM01DB01 - MCSDB DATABASE - HIGH REDUNDANCY CONTROLFILE



Number of Control files = 	  3
High redundancy diskgroups = 	  1
High redundancy diskgroups where control files are multiplexed = 		       1
Normal redundancy diskgroups where control files are multiplexed = 		       0

Status on dm01db02: FAIL => Database control files are not configured as recommended


Number of Control files = 	  3
High redundancy diskgroups = 	  1
High redundancy diskgroups where control files are multiplexed = 		       1
Normal redundancy diskgroups where control files are multiplexed = 		       0
Top

Top

Check ORACLE_PATCH 13667791 13734832 for RDBMS home

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Issues:-Bug 13257247 - AWR Snapshot collection hangs due to slow inserts into WRH$_TEMPSTATXS..

Please follow guidelines in the following link in section DB11 for corrective action
 
Links
Needs attention ondm01db01:/u01/app/oracle/product/11.2.0.3/dbhome_1, dm01db02:/u01/app/oracle/product/11.2.0.3/dbhome_1
Passed on-

Status on dm01db01:/u01/app/oracle/product/11.2.0.3/dbhome_1: FAIL => System may be exposed to Exadata Critical Issue DB11


Oracle Interim Patch Installer version 11.2.0.3.0
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0.3/dbhome_1
Central Inventory : /u01/app/oraInventory
from           : /u01/app/oracle/product/11.2.0.3/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.0
OUI version       : 11.2.0.3.0
Log file location : /u01/app/oracle/product/11.2.0.3/dbhome_1/cfgtoollogs/opatch/opatch2014-04-16_11-03-29AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/11.2.0.3/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2014-04-16_11-03-29AM.txt

------------------------------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.3.0
There are 1 products installed in this Oracle Home.


...More

Status on dm01db02:/u01/app/oracle/product/11.2.0.3/dbhome_1: FAIL => System may be exposed to Exadata Critical Issue DB11


Oracle Interim Patch Installer version 11.2.0.3.0
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0.3/dbhome_1
Central Inventory : /u01/app/oraInventory
from           : /u01/app/oracle/product/11.2.0.3/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.0
OUI version       : 11.2.0.3.0
Log file location : /u01/app/oracle/product/11.2.0.3/dbhome_1/cfgtoollogs/opatch/opatch2014-04-16_11-17-07AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/11.2.0.3/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2014-04-16_11-17-07AM.txt

------------------------------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.3.0
There are 1 products installed in this Oracle Home.


...More
Top

Top

Processes parameter for ASM instance

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 

Experience and testing has shown that certain ASM initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these ASM initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to the ASM instances. Unless otherwise specified, the value is for both X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal. 

Risk:

If the ASM initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. 

Action / Repair:

This avoids issues observed when ASM hits max # of processes.
For < 10 instances per node,50 X (DB instances per node + 1)
For > 10 instances per node,50 * MIN (# db instances per node +1, 11) + 10 * MAX (# db instance per node - 10, 0)
This new formula accommodates the consolidation case where there are a lot of instances per node.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ASM processes parameter is set to recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - PROCESSES PARAMETER FOR ASM INSTANCE



Processes parameter set on +ASM1 = 1024
Number of RDBMS instances running = 1
Recommended value for Processes parameter for +ASM1 = 100

Status on dm01db02: PASS => ASM processes parameter is set to recommended value


Processes parameter set on +ASM2 = 1024
Number of RDBMS instances running = 1
Recommended value for Processes parameter for +ASM2 = 100
Top

Top

Check for parameter recyclebin

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 
  
Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice  values set at deployment time. By setting these database initialization  parameters as recommended, known problems may be avoided and performance  maximized. 
The parameters are common to all database instances. The impact of setting  these parameters is minimal.  The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance  settings can be done after careful performance evaluation and clear understanding of the performance impact. 
  
Risk: 
  
If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization  parameter is not set as recommended, and the actual set value. 
  
Action / Repair: 
  
"RECYCLEBIN = ON" provides higher availability by enabling the Flashback Drop  feature. "ON" is the default value and should not be changed. 

 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => RECYCLEBIN is set to the recommended value

MCSDB1.recyclebin = on                                                          

Status on MCSDB2: PASS => RECYCLEBIN is set to the recommended value

MCSDB2.recyclebin = on                                                          
Top

Top

Verify celldisk configuration on flash memory devices

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server celldisks is critical for optimal performance and outage avoidance. Each storage server contains four Sun Flash Accelerator F20 PCIe cards, each with four flash modules (FMods), for a total of 16 flash memory devices. On a storage server configured according to Oracle best practices, there should be 16 celldisks on flash memory devices with a status of "normal".

The impact of verifying the basic storage server celldisk configuration is
minimal. Correcting any abnormalities is dependent upon the reason for the
anomaly, so the impact cannot be estimated here.

Risk:

If the basic storage server celldisk configuration is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify the basic storage server celldisk configuration on flash memory
devices, execute the following command as the "celladmin" user on each
storage server:

cellcli -e "list celldisk where disktype=flashdisk and status=normal" | wc -l

The output should be:

16

If the output is not as expected, execute the following command as the
"celladmin" user:

cellcli -e "list celldisk where disktype=flashdisk and status!=normal"

Perform your root cause analysis and corrective actions starting with the status key words returned. Please reference the following:

The "Maintaining Flash Disks" section of "Oracle® Exadata Database Machine, Owner's Guide 11g Release 2 (11.2), E13874-24"
 
Links
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => The celldisk configuration on flash memory devices matches Oracle best practices


DATA FROM DM01CEL01 FOR VERIFY CELLDISK CONFIGURATION ON FLASH MEMORY DEVICES



name:              	 FD_00_dm01cel01
comment:
creationTime:      	 2012-11-28T10:30:51+08:00
deviceName:        	 /dev/sdr
devicePartition:   	 /dev/sdr
diskType:          	 FlashDisk
errorCount:        	 0
freeSpace:         	 0
id:                	 6ba2320c-94d9-4553-bb70-6dc56de0afba
interleaving:      	 none
lun:               	 1_0
physicalDisk:      	 1112M07TJA
size:              	 22.875G
status:            	 normal

name:              	 FD_01_dm01cel01
...More
Top

Top

Verify griddisk count matches across all storage servers where a given prefix name exists

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the basic storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.

Risk:

If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify the storage server griddisk count matches across all storage server where a given prefix name exists, execute the following command as the "root" userid on the database server from which the onecommand script was executed during initial deployment:

for GD_PREFIX in `cellcli -e "list griddisk attributes name" | cut -d" " -f2 | gawk -F "_CD_" '{print $1}' | sort -u`;do GD_PREFIX_RESULT=`cellcli -e list griddisk where name like \'$GD_PREFIX\_.*\' | wc -l | cut -d" " -f2 | sort -u | wc -l`;if [ $GD_PREFIX_RESULT = 1 ]; then echo -e "$GD_PREFIX: SUCCESS";else echo -e $GD_PREFIX: FAILURE" cellcli -e list griddisk where name like \'$GD_PREFIX\_.*\' | wc -l";fi;done

The output should be similar to:
DATA_SLCC16: SUCCESS
DBFS_DG: SUCCESS
RECO_SLCC16: SUCCESS
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.
NOTE: On a storage server configured according to Oracle best practices, the total number of griddisks per storage server for a given prefix name (e.g: DATA) should match across all storage servers where the given prefix name exists.

NOTE: Not all storage servers are required to have all prefix names in use. This is possible where for security reasons a customer has segregated the storage servers, is using a data lifecycle management methodology, or an Oracle Storage Expansion Rack is in use. For example, when an Oracle Storage Expansion Rack is in use for data lifecycle management, those storage servers will likely have griddisks with unique names that differ from the griddisk names used on the storage servers that contain real time data, yet all griddisks are visible to the same cluster.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => The griddisk count matches across all storage servers where a given prefix name exists


DATA FROM DM01CEL01 FOR VERIFY GRIDDISK COUNT MATCHES ACROSS ALL STORAGE SERVERS WHERE A GIVEN PREFIX NAME EXISTS




DATA_DM01_CD_00_dm01cel01	 active
DATA_DM01_CD_01_dm01cel01	 active
DATA_DM01_CD_02_dm01cel01	 not present
DATA_DM01_CD_03_dm01cel01	 active
DATA_DM01_CD_04_dm01cel01	 active
DATA_DM01_CD_05_dm01cel01	 active
DATA_DM01_CD_06_dm01cel01	 active
DATA_DM01_CD_07_dm01cel01	 active
DATA_DM01_CD_08_dm01cel01	 active
DATA_DM01_CD_09_dm01cel01	 active
DATA_DM01_CD_10_dm01cel01	 active
DATA_DM01_CD_11_dm01cel01	 active
DBFS_DG_CD_02_dm01cel01	 not present
DBFS_DG_CD_03_dm01cel01	 active
DBFS_DG_CD_04_dm01cel01	 active
...More
Top

Top

Verify total number of griddisks with a given prefix name is evenly divisible of celldisks

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.

Risk:

If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify the total number of griddisks with a given prefix name is evenly divisible by the number of celldisks, execute the following command as the "celladmin" user on each storage server:
CELLDISK_COUNT=`cellcli -e "list celldisk where disktype=harddisk" | wc -l`;
for GD_SHORT_NAME in `cellcli -e "list griddisk attributes name" | awk -F "_CD_" '{print $1}' | sort -u | egrep -v "DBFS|SYSTEM"`; 
do export GD_COUNT=`cellcli -e "list griddisk where name like '$GD_SHORT_NAME.*'" | wc -l`; 
if [ `expr $GD_COUNT % $CELLDISK_COUNT` = 0 ]; 
then echo -e "$GD_SHORT_NAME:  SUCCESS"
else
echo -e "$GD_SHORT_NAME:  FAILURE:";
cellcli -e "list griddisk attributes name where name like '$GD_SHORT_NAME\_.*'";
fi; 
done;
The output should be similar to:
DATA_SLCC16:  SUCCESS
RECO_SLCC16:  SUCCESS 
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, the total number of all griddisks with a given name prefix (e.g: DATA) should be evenly divisible by the number of celldisks.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => The total number of griddisks with a given prefix name is evenly divisible by the number of celldisks


DATA FROM DM01CEL01 FOR VERIFY TOTAL NUMBER OF GRIDDISKS WITH A GIVEN PREFIX NAME IS EVENLY DIVISIBLE OF CELLDISKS




DATA_DM01:  SUCCESS
RECO_DM01:  SUCCESS




DATA FROM DM01CEL02 FOR VERIFY TOTAL NUMBER OF GRIDDISKS WITH A GIVEN PREFIX NAME IS EVENLY DIVISIBLE OF CELLDISKS




DATA_DM01:  SUCCESS
RECO_DM01:  SUCCESS


...More
Top

Top

Verify griddisk ASM status

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.

Risk:

If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify the storage server griddisk ASM status, execute the following command as the "celladmin" user on each storage server:
ASM_STAT_RESLT=`cellcli -e "list griddisk attributes name,status, asmmodestatus,asmdeactivationoutcome" | egrep -v ".*\.*\.*\" | wc -l`;
if [ $ASM_STAT_RESLT = 0 ]
then
echo -e "\nSUCCESS\n"
else
echo -e "\nFAILURE:";
cellcli -e "list griddisk attributes name,status, asmmodestatus,asmdeactivationoutcome" | egrep -v ".*\.*\.*\";
echo -e "\n";
fi;
The output should be:
SUCCESS
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, all griddisks should have "status" of "active", "asmmodestatus" of "online" and "asmdeactivationoutcome" of "yes".
 
Needs attention ondm01cel01
Passed ondm01cel03, dm01cel02

Status on dm01cel01: FAIL => The griddisk ASM status should match specification


DATA FROM DM01CEL01 FOR VERIFY GRIDDISK ASM STATUS



DATA_DM01_CD_02_dm01cel01	 not present	 DROPPED	 Yes
DBFS_DG_CD_02_dm01cel01  	 not present	 DROPPED	 Yes
RECO_DM01_CD_02_dm01cel01	 not present	 DROPPED	 Yes





Status on dm01cel03, dm01cel02: PASS


DATA FROM DM01CEL02 FOR VERIFY GRIDDISK ASM STATUS



DATA_DM01_CD_00_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_01_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_02_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_03_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_04_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_05_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_06_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_07_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_08_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_09_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_10_dm01cel02	 active	 ONLINE	 Yes
DATA_DM01_CD_11_dm01cel02	 active	 ONLINE	 Yes
DBFS_DG_CD_02_dm01cel02  	 active	 ONLINE	 Yes
DBFS_DG_CD_03_dm01cel02  	 active	 ONLINE	 Yes
DBFS_DG_CD_04_dm01cel02  	 active	 ONLINE	 Yes
DBFS_DG_CD_05_dm01cel02  	 active	 ONLINE	 Yes
...More
Top

Top

Verify celldisk configuration on disk drives

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server celldisks is critical for optimal performance and outage avoidance.
The impact of verifying the basic storage server celldisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.

Risk:

If the basic storage server celldisk configuration is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify the basic storage server celldisk configuration on disk drives, execute the following command as the "celladmin" user on each storage server:
cellcli -e "list celldisk where disktype=harddisk and status=normal" | wc -l
The output should be:
12
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, there should be 12 celldisks on disk drives with a status of "normal".
 
Needs attention ondm01cel01
Passed ondm01cel03, dm01cel02

Status on dm01cel01: FAIL => The celldisk configuration on disk drives should match Oracle best practices


DATA FROM DM01CEL01 FOR VERIFY CELLDISK CONFIGURATION ON DISK DRIVES



name:              	 CD_00_dm01cel01
comment:
creationTime:      	 2012-11-28T10:30:40+08:00
deviceName:        	 /dev/sda
devicePartition:   	 /dev/sda3
diskType:          	 HardDisk
errorCount:        	 0
freeSpace:         	 0
id:                	 5e35f429-e6b8-422a-bde4-2705c82fe9bc
interleaving:      	 none
lun:               	 0_0
physicalDisk:      	 L45WSN
raidLevel:         	 0
size:              	 1832.59375G
status:            	 normal

...More

Status on dm01cel03, dm01cel02: PASS


DATA FROM DM01CEL02 FOR VERIFY CELLDISK CONFIGURATION ON DISK DRIVES



name:              	 CD_00_dm01cel02
comment:
creationTime:      	 2012-11-28T10:30:41+08:00
deviceName:        	 /dev/sda
devicePartition:   	 /dev/sda3
diskType:          	 HardDisk
errorCount:        	 0
freeSpace:         	 0
id:                	 6901754a-39ff-4d2a-8777-d5a1ac70f914
interleaving:      	 none
lun:               	 0_0
physicalDisk:      	 L45T67
raidLevel:         	 0
size:              	 1832.59375G
status:            	 normal

...More
Top

Top

Verify total size of all griddisks fully utilizes celldisk capacity

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.

Risk:

If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify the storage server griddisk configuration fully utilizes celldisk capacity, execute the following command as the "celladmin" user on each storage server:
echo `cellcli -e "list griddisk attributes size" | awk '{ SUM += $1} END { print SUM}'` `cellcli -e "list celldisk attributes size where disktype=harddisk" | awk '{ SUM += $1} END { print SUM}'` | awk '{ printf("%f\n", ($1/$2)*100) }'
The output returned will be similar to:
99.991561
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, the total size of all configured griddisks will be greater than 99.5% of the total of celldisks configured on disk drives.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => The total size of all griddisks fully utilizes celldisk capacity


DATA FROM DM01CEL01 FOR VERIFY TOTAL SIZE OF ALL GRIDDISKS FULLY UTILIZES CELLDISK CAPACITY



Cell Disks Size = 22281.7
Grid Disks Size = 22282.2




DATA FROM DM01CEL02 FOR VERIFY TOTAL SIZE OF ALL GRIDDISKS FULLY UTILIZES CELLDISK CAPACITY



Cell Disks Size = 22281.7
Grid Disks Size = 22282.2




...More
Top

Top

Check for parameter asm_power_limit

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain ASM initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these ASM initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are specific to the ASM instances. Unless otherwise specified, the value is for Database Machine V2, X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal.

Risk: 

If the ASM initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

Default ASM_POWER_LIMIT value is 1 for an Exadata quarter rack, 2 for all other rack sizes.

The Exadata default is chosen to mitigate application performance impact during ASM rebalance. Please evaluate application performance impact before using a higher ASM_POWER_LIMIT. 
 
Needs attention on-
Passed on+ASM1, +ASM2

Status on +ASM1: PASS => ASM parameter ASM_POWER_LIMIT is set to the default value.

+ASM1.asm_power_limit = 1                                                       

Status on +ASM2: PASS => ASM parameter ASM_POWER_LIMIT is set to the default value.

+ASM2.asm_power_limit = 1                                                       
Top

Top

Verify average ping times to DNS nameserver

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Secure Shell (SSH) remote login procedures require communication between the remote target device and the DNS nameserver. Verifying that the ping times between Exadata storage servers, database servers, and InfiniBand switches are minimal improves remote SSH login times.
The impact of verifying minimal ping times to the DNS server is minimal. Because the effort required to minimize ping times will vary by configuration and root cause, the impact cannot be estimated here.

Risk:

If ping times between remote SSH targets and the active DNS server become too long, delays in remote login operations and/or timeouts in applications may be observed.

Action / Repair:

To verify the ping times between the active DNS server and Exadata database servers, enter the following command as the "root" userid on each database server:

HOST_NAME=`hostname`; DNS_SERVER=`nslookup $HOST_NAME | head -1 | cut -d: -f2 | sed -e 's/^[ \t]*//'`; echo -e "Active DNS Server IP:\t\t$DNS_SERVER"; echo -n -e "Average for 10 pings in ms:\t"; ping -c10 $DNS_SERVER | grep -E ^64| cut -d"=" -f 4 | cut -d" " -f 1 | awk '{ SUM += $1} END { print SUM/10}'

The output should be similar to the following:
Active DNS Server IP:           130.35.249.52
Average for 10 pings in ms:     1.794
As with any network response measurement, the response time should be "minimal". If the reported average ping response times are not "minimal", and especially if the environment is experiencing long SSH login times or application timeouts, investigate the condition and take corrective action.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => DNS Server ping time is in acceptable range


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY AVERAGE PING TIMES TO DNS NAMESERVER



Active DNS Server IP:		10.187.4.86
Average for 10 pings in ms:	0.1506

Status on dm01db02: PASS => DNS Server ping time is in acceptable range


Active DNS Server IP:		10.187.4.86
Average for 10 pings in ms:	0.1839
Top

Top

Verify average ping times to DNS nameserver

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Secure Shell (SSH) remote login procedures require communication between the remote target device and the DNS nameserver. Verifying that the ping times between Exadata storage servers, database servers, and InfiniBand switches are minimal improves remote SSH login times.
The impact of verifying minimal ping times to the DNS server is minimal. Because the effort required to minimize ping times will vary by configuration and root cause, the impact cannot be estimated here.

Risk:

If ping times between remote SSH targets and the active DNS server become too long, delays in remote login operations and/or timeouts in applications may be observed.

Action / Repair:

To verify the ping times between the active DNS server and Exadata database servers, enter the following command as the "root" userid on each database server:

HOST_NAME=`hostname`; DNS_SERVER=`nslookup $HOST_NAME | head -1 | cut -d: -f2 | sed -e 's/^[ \t]*//'`; echo -e "Active DNS Server IP:\t\t$DNS_SERVER"; echo -n -e "Average for 10 pings in ms:\t"; ping -c10 $DNS_SERVER | grep -E ^64| cut -d"=" -f 4 | cut -d" " -f 1 | awk '{ SUM += $1} END { print SUM/10}'

The output should be similar to the following:
Active DNS Server IP:           130.35.249.52
Average for 10 pings in ms:     1.794
As with any network response measurement, the response time should be "minimal". If the reported average ping response times are not "minimal", and especially if the environment is experiencing long SSH login times or application timeouts, investigate the condition and take corrective action.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => DNS Server ping time is in acceptable range


DATA FROM DM01CEL01 FOR VERIFY AVERAGE PING TIMES TO DNS NAMESERVER



Active DNS Server IP:		10.187.4.86
Average for 10 pings in ms:	0.2088




DATA FROM DM01CEL02 FOR VERIFY AVERAGE PING TIMES TO DNS NAMESERVER



Active DNS Server IP:		10.187.4.86
Average for 10 pings in ms:	0.3




...More
Top

Top

Verify Database Server Disk Controller Configuration

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the RAID devices is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.

Risk:

Not verifying the RAID devices increases the chance of a performance degradation or an outage.

Action / Repair:

To verify the database server disk controller configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Device Present" -A 8 
For X2-2, the output will be similar to:
                Device Present
                ================
  Virtual Drives    : 1
    Degraded        : 0
    Offline         : 0
  Physical Devices  : 5
    Disks           : 4
    Critical Disks  : 0
    Failed Disks    : 0  
The expected output is 1 virtual drive, none degraded or offline, 5 physical devices (controller + 4 disks), 4 disks, and no critical or failed disks.

 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database Server Disk Controller Configuration meets recommendation


DATA FROM DM01DB01 FOR VERIFY DATABASE SERVER DISK CONTROLLER CONFIGURATION



Device Present
================
Virtual Drives    : 1
Degraded        : 0
Offline         : 0
Physical Devices  : 5
Disks           : 4
Critical Disks  : 0
Failed Disks    : 0

Status on dm01db02: PASS => Database Server Disk Controller Configuration meets recommendation


DATA FROM DM01DB02 FOR VERIFY DATABASE SERVER DISK CONTROLLER CONFIGURATION



Device Present
================
Virtual Drives    : 1
Degraded        : 0
Offline         : 0
Physical Devices  : 5
Disks           : 4
Critical Disks  : 0
Failed Disks    : 0
Top

Top

Verify Database Server Physical Drive Configuration

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.

For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.

The impact of validating the physical drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.

Risk:

Not verifying the physical drives increases the chance of a performance degradation or an outage.

Action / Repair:

To verify the database server physical drive configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 PDList -aALL | grep "Firmware state"
The output for X2-2 will be similar to:
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Hotspare, Spun down
There should be three lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". The ordering of the output lines is not significant and may vary based upon a given database server's physical drive replacement history.

NOTE: Modified 03/21/12

Occasionally in normal operation, the "Hotspare" physical drive may be brought to a state of "Online, Spun Up". Thirty minutes (default) after the operation that brought the drive to "Online, Spun Up" has completed, the drive should spin down due to the powersaving feature. There is no harm for the drive to be "Online, Spun Up" if there are no other errors reported in the disk drive configuration checks.

 For additional information, please reference My Oracle Support note "Exadata: Hot Spares Not Spinning Down (Doc ID 1403613.1)" 
 
Needs attention ondm01db01, dm01db02
Passed on-

Status on dm01db01: FAIL => Database Server Physical Drive Configuration does not meet recommendation


DATA FROM DM01DB01 FOR VERIFY DATABASE SERVER PHYSICAL DRIVE CONFIGURATION



Firmware state: Hotspare, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

Status on dm01db02: FAIL => Database Server Physical Drive Configuration does not meet recommendation


DATA FROM DM01DB02 FOR VERIFY DATABASE SERVER PHYSICAL DRIVE CONFIGURATION



Firmware state: Hotspare, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Top

Top

Check for parameter memory_max_target

Success FactorLINUX DATA COLLECTIONS AND AUDIT CHECKS
Recommendation
 Benefit / Impact: 
 
Experience and testing has shown that certain ASM initialization parameters 
should be set at specific values. These are the best practice values set at 
deployment time. By setting these ASM initialization parameters as 
recommended, known problems may be avoided and performance maximized. The 
parameters are specific to the ASM instances. Unless otherwise specified, the 
value is for both X2-2 and X2-8 Database Machines. The impact of setting 
these parameters is minimal. 

Risk: 
 
If the ASM initialization parameters are not set as recommended, a variety of 
issues may be encountered, depending upon which initialization parameter is 
not set as recommended, and the actual set value. 
 
Action / Repair: 
 
Setting MEMORY_MAX_TARGET to 0 disables MEMORY_MAX_TARGET for the ASM instance
 
NOTE: The proper way to implement the memory related parameters is as 
follows. This is important as it works around an issue where memory_target 
remains set despite setting it to 0. 
 
    * alter system set sga_target=1250M sid='*' scope=spfile; 
    * alter system set pga_aggregate_target=400M sid='*' scope=spfile; 
    * alter system set memory_target=0 sid='*' scope=spfile; 
    * alter system set memory_max_target=0 sid='*' scope=spfile; 
    * alter system reset memory_max_target sid='*' scope=spfile; 
 
Links
Needs attention on-
Passed on+ASM1, +ASM2

Status on +ASM1: PASS => ASM parameter MEMORY_MAX_TARGET is set according to recommended value

+ASM1.memory_max_target = 0                                                     

Status on +ASM2: PASS => ASM parameter MEMORY_MAX_TARGET is set according to recommended value

+ASM2.memory_max_target = 0                                                     
Top

Top

Check for parameter pga_aggregate_target

Success FactorLINUX DATA COLLECTIONS AND AUDIT CHECKS
Recommendation
 Benefit / Impact: 
 
Experience and testing has shown that certain ASM initialization parameters 
should be set at specific values. These are the best practice values set at 
deployment time. By setting these ASM initialization parameters as 
recommended, known problems may be avoided and performance maximized. The 
parameters are specific to the ASM instances. Unless otherwise specified, the 
value is for both X2-2 and X2-8 Database Machines. The impact of setting 
these parameters is minimal. 
 
Risk: 
 
If the ASM initialization parameters are not set as recommended, a variety of 
issues may be encountered, depending upon which initialization parameter is 
not set as recommended, and the actual set value. 
 
Action / Repair: 
 
This disables memory_target for the ASM instance and setting PGA_AGGREGATE_TARGET to 400M provides the ASM instance sufficient PGA memory
 
NOTE: The proper way to implement the memory related parameters is as 
follows. This is important as it works around an issue where memory_target 
remains set despite setting it to 0. 
 
    * alter system set sga_target=1250M sid='*' scope=spfile; 
    * alter system set pga_aggregate_target=400M sid='*' scope=spfile; 
    * alter system set memory_target=0 sid='*' scope=spfile; 
    * alter system set memory_max_target=0 sid='*' scope=spfile; 
    * alter system reset memory_max_target sid='*' scope=spfile; 
 
Links
Needs attention on-
Passed on+ASM1, +ASM2

Status on +ASM1: PASS => ASM parameter PGA_AGGREGATE_TARGET is set according to recommended value

+ASM1.pga_aggregate_target = 419430400                                          

Status on +ASM2: PASS => ASM parameter PGA_AGGREGATE_TARGET is set according to recommended value

+ASM2.pga_aggregate_target = 419430400                                          
Top

Top

Check for parameter memory_target

Success FactorLINUX DATA COLLECTIONS AND AUDIT CHECKS
Recommendation
 Benefit / Impact: 
 
Experience and testing has shown that certain ASM initialization parameters 
should be set at specific values. These are the best practice values set at 
deployment time. By setting these ASM initialization parameters as 
recommended, known problems may be avoided and performance maximized. The 
parameters are specific to the ASM instances. Unless otherwise specified, the 
value is for both X2-2 and X2-8 Database Machines. The impact of setting 
these parameters is minimal. 
 
Risk: 
 
If the ASM initialization parameters are not set as recommended, a variety of 
issues may be encountered, depending upon which initialization parameter is 
not set as recommended, and the actual set value. 
 
Action / Repair: 
 
This disables memory_target for the ASM instance. 
 
NOTE: The proper way to implement the memory related parameters is as 
follows. This is important as it works around an issue where memory_target 
remains set despite setting it to 0. 
 
    * alter system set sga_target=1250M sid='*' scope=spfile; 
    * alter system set pga_aggregate_target=400M sid='*' scope=spfile; 
    * alter system set memory_target=0 sid='*' scope=spfile; 
    * alter system set memory_max_target=0 sid='*' scope=spfile; 
    * alter system reset memory_max_target sid='*' scope=spfile; 
 
Links
Needs attention on-
Passed on+ASM1, +ASM2

Status on +ASM1: PASS => ASM parameter MEMORY_TARGET is set according to recommended value

+ASM1.memory_target = 0                                                         

Status on +ASM2: PASS => ASM parameter MEMORY_TARGET is set according to recommended value

+ASM2.memory_target = 0                                                         
Top

Top

Check for parameter sga_target

Success FactorLINUX DATA COLLECTIONS AND AUDIT CHECKS
Recommendation
 Benefit / Impact: 
 
Experience and testing has shown that certain ASM initialization parameters 
should be set at specific values. These are the best practice values set at 
deployment time. By setting these ASM initialization parameters as 
recommended, known problems may be avoided and performance maximized. The 
parameters are specific to the ASM instances. Unless otherwise specified, the 
value is for both X2-2 and X2-8 Database Machines. The impact of setting 
these parameters is minimal. 
 
Risk: 
 
If the ASM initialization parameters are not set as recommended, a variety of 
issues may be encountered, depending upon which initialization parameter is 
not set as recommended, and the actual set value. 
 
Action / Repair: 
 
This disables memory_target for the ASM instance and setting SGA_TARGET to 1250M provides the ASM instance sufficient SGA memory.
 
NOTE: The proper way to implement the memory related parameters is as 
follows. This is important as it works around an issue where memory_target 
remains set despite setting it to 0. 
 
    * alter system set sga_target=1250M sid='*' scope=spfile; 
    * alter system set pga_aggregate_target=400M sid='*' scope=spfile; 
    * alter system set memory_target=0 sid='*' scope=spfile; 
    * alter system set memory_max_target=0 sid='*' scope=spfile; 
    * alter system reset memory_max_target sid='*' scope=spfile;
 
Links
Needs attention on-
Passed on+ASM1, +ASM2

Status on +ASM1: PASS => ASM parameter SGA_TARGET is set according to recommended value.

+ASM1.sga_target = 1325400064                                                   

Status on +ASM2: PASS => ASM parameter SGA_TARGET is set according to recommended value.

+ASM2.sga_target = 1325400064                                                   
Top

Top

Check for parameter fast_start_mttr_target

Success FactorCOMPUTER FAILURE PREVENTION BEST PRACTICES
Recommendation
 The deployment default for fast_start_mttr_target is 60. To optimize run time performance for write/redo generation intensive workloads, increase fast_start_mttr_target to 300. This will reduce checkpoint writes from DBWR processes, making more room for LGWR IO. The trade-off is that instance recovery will run longer, so if instance recovery is more important than performance, then keep fast_start_mttr_target low. Also keep in mind that an application with inadequately sized redo logs will likley not see an affect from this change due to frequent log switches.

Considerations for a direct writes in a data warehouse type of application: Even though direct operations aren't using the buffer cache, fast_start_mttr_target is very effective at controlling crash recovery time because it ensures adequate checkpointing for the few buffers that are resident (ex: undo segment headers). fast_start_mttr_target should be set to the desired RTO (Recovery Time Objective) while still maintaing performance SLAs.
 
Links
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => fast_start_mttr_target has been changed from default

MCSDB1.fast_start_mttr_target = 300                                             

Status on MCSDB2: PASS => fast_start_mttr_target has been changed from default

MCSDB2.fast_start_mttr_target = 300                                             
Top

Top

Oracle net services configuration to ship redo

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 Optimize network throughput following the best practices described in section 8.2.2 in the HA Best Practice guide:
 
Links
Needs attention ondm01db01, dm01db02
Passed on-

Status on dm01db01: FAIL => Oracle Net service name to ship redo to the standby is not configured properly


DATA FROM DM01DB01 - MCSDB DATABASE - ORACLE NET SERVICES CONFIGURATION TO SHIP REDO




TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 16-APR-2014 11:16:01

Copyright (c) 1997, 2011, Oracle.  All rights reserved.

Used parameter files:

TNS-03505: Failed to resolve name

Status on dm01db02: FAIL => Oracle Net service name to ship redo to the standby is not configured properly



TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 16-APR-2014 11:21:40

Copyright (c) 1997, 2011, Oracle.  All rights reserved.

Used parameter files:

TNS-03505: Failed to resolve name
Top

Top

Redo transport protocol

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 The ARCH redo transport mode has been deprecated and will be desupported in a future release. Oracle recommends that you switch to the ASYNC transport mode if you are currently using the ARCH transport mode. The ASYNC transport mode is superior to the ARCH transport mode in all respects, and is the new default transport mode.
 
Needs attention onMCSDB
Passed on-

Status on MCSDB: FAIL => Remote destination not is using either ASYNC or SYNC transport for redo transport


DATA FOR MCSDB FOR REDO TRANSPORT PROTOCOL



Top

Top

Check for parameter undo_retention

Success FactorLOGICAL CORRUPTION PREVENTION BEST PRACTICES
Recommendation
 Oracle Flashback Technology enables fast logical failure repair. Oracle recommends that you use automatic undo management with sufficient space to attain your desired undo retention guarantee, enable Oracle Flashback Database, and allocate sufficient space and I/O bandwidth in the fast recovery area.  Application monitoring is required for early detection.  Effective and fast repair comes from leveraging and rehearsing the most common application specific logical failures and using the different flashback features effectively (e.g flashback query, flashback version query,flashback transaction query, flashback transaction, flashback drop, flashback table, and flashback database)
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter UNDO_RETENTION is not null

MCSDB1.undo_retention = 900                                                     

Status on MCSDB2: PASS => Database parameter UNDO_RETENTION is not null

MCSDB2.undo_retention = 900                                                     
Top

Top

Verify all "BIGFILE" tablespaces have non-default "MAXBYTES" values set

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

"MAXBYTES" is the SQL attribute that expresses the "MAXSIZE" value that is used in the DDL command to set "AUTOEXTEND" to "ON". By default, for a bigfile tablespace, the value is "3.5184E+13", or "35184372064256". The benefit of having "MAXBYTES" set at a non-default value for "BIGFILE" tablespaces is that a runaway operation or heavy simultaneous use (e.g., temp tablespace) cannot take up all the space in a diskgroup.

The impact of verifying that "MAXBYTES" is set to a non-default value is minimal. The impact of setting the "MAXSIZE" attribute to a non-default value "varies depending upon if it is done during database creation, file addition to a tablespace, or added to an existing file.

Risk

The risk of running out of space in a diskgroup varies by application and cannot be quantified here. A diskgroup running out of space may impact the entire database as well as ASM operations (e.g., rebalance operations).

Action / Repair:

To obtain a list of file numbers and bigfile tablespaces that have the "MAXBYTES" attribute at the default value, enter the following sqlplus command logged into the database as sysdba:
select file_id, a.tablespace_name, autoextensible, maxbytes
from (select file_id, tablespace_name, autoextensible, maxbytes from dba_data_files where autoextensible='YES' and maxbytes = 35184372064256) a, (select tablespace_name from dba_tablespaces where bigfile='YES') b
where a.tablespace_name = b.tablespace_name
union
select file_id,a.tablespace_name, autoextensible, maxbytes
from (select file_id, tablespace_name, autoextensible, maxbytes from dba_temp_files where autoextensible='YES' and maxbytes = 35184372064256) a, (select tablespace_name from dba_tablespaces where bigfile='YES') b
where a.tablespace_name = b.tablespace_name;

The output should be:no rows returned 

If you see output similar to:

   FILE_ID TABLESPACE_NAME                AUT   MAXBYTES
---------- ------------------------------ --- ----------
         1 TEMP                           YES 3.5184E+13
         3 UNDOTBS1                       YES 3.5184E+13
         4 UNDOTBS2                       YES 3.5184E+13

Investigate and correct the condition.
 
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => All bigfile tablespaces have non-default maxbytes values set


DATA FOR MCSDB FOR VERIFY ALL "BIGFILE" TABLESPACES HAVE NON-DEFAULT "MAXBYTES" VALUES SET




If no rows returned means query did not return any row and SQL check passed

Top

Top

Verify InfiniBand subnet manager is running on an InfiniBand switch

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The Impact of verifying that the InfiniBand switch software is running on an InfiniBand switch is minimal. To move the InfiniBand subnet manager requires stopping the subnet manager on the incorrect InfiniBand fabric node and verifying that it failed over to an InfiniBand switch.

Risk:

If the InfiniBand subnet manager is not running on an InfiniBand switch, the InfiniBand fabric may crash during certain fabric management transitions.

Action / Repair:

To verify the InfiniBand subnet manager is located on an InfiniBand switch, execute the following command as the "root" userid on a database server:
export SUBNET_MGR_GID=`sminfo | cut -d" " -f7 | cut -c3-16`;export SUBNET_MGR_LOC="OTHER";for IB_NODE_GID in `ibswitches | cut -c14-27`; do if [ $SUBNET_MGR_GID = $IB_NODE_GID ]; then export SUBNET_MGR_LOC="IB_SWITCH"; fi; done; echo $SUBNET_MGR_LOC;
The output should be similar to:
IB_SWITCH
If the output is not "IB_SWITCH", investigate and correct the condition.
NOTE: For additional guidance on configuring the InfiniBand subnet manager, please see the "Setting the Subnet Manager Master on Exadata Database Machine Full Rack and Exadata Database Machine Half Rack" section of the "Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2), E13874-17".
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => subnet manager is running on an InfiniBand switch


DATA FROM DM01DB01 FOR VERIFY INFINIBAND SUBNET MANAGER IS RUNNING ON AN INFINIBAND SWITCH



Subnet manager GUID=2128e8af6da0a0 and switches GUID are 2128e8b08ea0a0 2128e8af6da0a0
Top

Top

LOG_FILE_NAME_CONVERT

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 As part of a switchover, the standby database must clear the online redo log files on the standby database before opening as a primary database. The time needed to complete the I/O can significantly increase the overall switchover time. By setting theLOG_FILE_NAME_CONVERT parameter, the standby database can pre-create the online redo logs the first time the MRP process is started. You can also pre-create empty online redo logs by issuing the SQL*Plus ALTER DATABASE CLEAR LOGFILE statement on the standby database.  See section of 8.4.1 of the HA Best Practice guide for further information.  
 
Links
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => Database parameter LOG_FILE_NAME_CONVERT or DB_CREATE_ONLINE_LOG_DEST_1 is not null


DATA FOR MCSDB FOR LOG_FILE_NAME_CONVERT




log_file_name_convert


db_create_online_log_dest_1
+DATA_DM01

Top

Top

Clusterware status

Success FactorCLIENT FAILOVER OPERATIONAL BEST PRACTICES
Recommendation
 Oracle clusterware is required for complete client failover integration.  Please consult the following whitepaper for further information
 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Clusterware is running


DATA FROM DM01DB01 - MCSDB DATABASE - CLUSTERWARE STATUS



--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA_DM01.dg
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.DBFS_DG.dg
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.LISTENER.lsnr
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.RECO_DM01.dg
ONLINE  ONLINE       dm01db01
...More

Status on dm01db02: PASS => Clusterware is running


--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA_DM01.dg
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.DBFS_DG.dg
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.LISTENER.lsnr
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.RECO_DM01.dg
ONLINE  ONLINE       dm01db01
ONLINE  ONLINE       dm01db02
ora.asm
ONLINE  ONLINE       dm01db01                 Started
ONLINE  ONLINE       dm01db02                 Started
...More
Top

Top

Logical standby unsupported datatypes

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 You can use a transient logical standby database to perform a rolling database upgrade using your current physical standby database by temporarily converting it to a logical standby database. Use a transient logical standby when your configuration only has a physical standby database.  To determine data types that exist in your database that are not supported for logical standby refer to appendix C of the Data Guard Concepts and Admin guide:
 
Links
Needs attention onMCSDB
Passed on-

Status on MCSDB: WARNING => Logical standby unsupported datatypes found


DATA FOR MCSDB FOR LOGICAL STANDBY UNSUPPORTED DATATYPES




DEV5_SOAINFRA                  AQ$_IP_QTAB_I
DEV5_SOAINFRA                  AQ$_EDN_EVENT_QUEUE_TABLE_H
DEV5_SOAINFRA                  AQ$_EDN_EVENT_QUEUE_TABLE_G
DEV5_SOAINFRA                  AQ$_EDN_OAOO_DELIVERY_TABLE_I
DEV6_BIPLATFORM                SDCLEANUPLIST2
DEV1_SOAINFRA                  AQ$_IP_QTAB_I
DEV1_SOAINFRA                  AQ$_EDN_EVENT_QUEUE_TABLE_G
DEV1_SOAINFRA                  AQ$_EDN_OAOO_DELIVERY_TABLE_G
GIS                            MAP_STATION_POI
DEV1_SOAINFRA                  AQ$_IP_QTAB_S
DEV1_SOAINFRA                  AQ$_IP_QTAB_H
DEV1_SOAINFRA                  AQ$_EDN_EVENT_QUEUE_TABLE_T
DEV1_SOAINFRA                  AQ$_EDN_EVENT_QUEUE_TABLE_I

DEV1_SOAINFRA                  AQ$_EDN_OAOO_DELIVERY_TABLE_T
...More
Top

Top

Standby recovery mode

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 Use real-time apply so that redo data is applied to the standby database as soon as it is received.   See section of 8.3.8 of the HA Best Practice guide for further information.   
 
Links
Needs attention onMCSDB
Passed on-

Status on MCSDB: FAIL => Standby is not running in MANAGED REAL TIME APPLY mode


DATA FOR MCSDB FOR STANDBY RECOVERY MODE



Top

Top

Standby redolog status on primary

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 You should configure standby redo logs on both sites for improved availability and performance. To determine the recommended number of standby redo logs, use the following formula:
(maximum # of logfile groups +1) * maximum # of threads

 
Links
Needs attention onMCSDB
Passed on-

Status on MCSDB: FAIL => Standby redo logs are not configured on both sites


DATA FOR MCSDB FOR STANDBY REDOLOG STATUS ON PRIMARY



Top

Top

Dataguard broker confugration

Success FactorCLIENT FAILOVER OPERATIONAL BEST PRACTICES
Recommendation
 Data Guard broker is required for complete client failover integration.  Please consult the following whitepaper for further information
 
Links
Needs attention ondm01db01, dm01db02
Passed on-

Status on dm01db01: FAIL => Dataguard broker configuration does not exist


DATA FROM DM01DB01 - MCSDB DATABASE - DATAGUARD BROKER CONFUGRATION



DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
Error:
ORA-16525: the Data Guard broker is not yet available

Configuration details cannot be determined by DGMGRL

Status on dm01db02: FAIL => Dataguard broker configuration does not exist


DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
Error:
ORA-16525: the Data Guard broker is not yet available

Configuration details cannot be determined by DGMGRL
Top

Top

High redundancy diskgroups

Success FactorSTORAGE FAILURES PREVENTION BEST PRACTICES
Recommendation
 Choose Oracle Automatic Storage Management (ASM) high redundancy for DATA and 
RECO disk groups for best tolerance from data and storage failures. ASM high redundancy is the ideal recommendation for business critical databases
 
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => At least one high redundancy diskgroup configured


DATA FOR MCSDB FOR HIGH REDUNDANCY DISKGROUPS




DATA_DM01                      HIGH
DBFS_DG                        NORMAL
RECO_DM01                      NORMAL
Top

Top

Physical standby status

Success FactorDATABASE/CLUSTER/SITE FAILURE PREVENTION BEST PRACTICES
Recommendation
 Oracle Data Guard is a high availability and disaster-recovery solution that provides very fast automatic failover (referred to as fast-start failover) in database failures, node failures, corruption, and media failures. Furthermore, the standby databases can be used for read-only access and subsequently for reader farms, for reporting, for backups and for testing and development.    
For zero data loss protection and fastest  recovery time, deploy a local Data Guard standby database with  Data Guard Fast-Start Failover. For protection against outages impacting both the primary and the local standby, deploy a second Data Guard standby database at a remote location.

 
Needs attention onMCSDB
Passed on-

Status on MCSDB: FAIL => Physical standby status is not valid


DATA FOR MCSDB FOR PHYSICAL STANDBY STATUS



Top

Top

Flashback database on primary

Success FactorLOGICAL CORRUPTION PREVENTION BEST PRACTICES
Recommendation
 Oracle Flashback Technology enables fast logical failure repair. Oracle recommends that you use automatic undo management with sufficient space to attain your desired undo retention guarantee, enable Oracle Flashback Database, and allocate sufficient space and I/O bandwidth in the fast recovery area.  Application monitoring is required for early detection.  Effective and fast repair comes from leveraging and rehearsing the most common application specific logical failures and using the different flashback features effectively (e.g flashback query, flashback version query,flashback transaction query, flashback transaction, flashback drop, flashback table, and flashback database)
 
Links
Needs attention onMCSDB
Passed on-

Status on MCSDB: FAIL => Flashback is not configured


DATA FOR MCSDB FOR FLASHBACK DATABASE ON PRIMARY




primary_flashback = NO
Top

Top

Database init paramter DB_BLOCK_CHECKING on standby

Success FactorDATA CORRUPTION PREVENTION BEST PRACTICES
Recommendation
 Critical

Benefit / Impact:

Intially db_block_checking is set to off due to potential performance impact. Performance testing is particularly important given that overhead is incurred on every block change. Block checking typically causes 1% to 10% overhead, but for update and insert intensive applications (such as Redo Apply at a standby database) the overhead can be much higher. OLTP compressed tables also require additional checks that can result in higher overhead depending on the frequency of updates to those tables. Workload specific testing is required to assess whether the performance overhead is acceptable.


Risk:

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair:

Based on performance testing results set the primary or standby database to either medium or full depending on the impact. If performance concerns prevent setting DB_BLOCK_CHECKING to either FULL or MEDIUM at a primary database, then it becomes even more important to enable this at the standby database. This protects the standby database from logical corruption that would be undetected at the primary database.
For higher data corruption detection and prevention, enable this setting but performance impacts vary per workload.
Evaluate performance impact.

 
Links
Needs attention ondm01db01, dm01db02
Passed on-

Status on dm01db01: WARNING => Database parameter DB_BLOCK_CHECKING is NOT set to the recommended value.


DATA FROM DM01DB01 - MCSDB DATABASE - DATABASE INIT PARAMTER DB_BLOCK_CHECKING ON STANDBY



DB_BLOCK_CHECKING = FALSE

Status on dm01db02: WARNING => Database parameter DB_BLOCK_CHECKING is NOT set to the recommended value.


DB_BLOCK_CHECKING = FALSE
Top

Top

Scan storage server alerthistory for open alerts

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 
 
Needs attention ondm01cel03, dm01cel02, dm01cel01
Passed on-

Status on dm01cel03, dm01cel02, dm01cel01: FAIL => one or storage server has open critical alerts.


DATA FROM DM01CEL01 FOR SCAN STORAGE SERVER ALERTHISTORY FOR OPEN ALERTS



6_1 	 2012-11-28T10:33:05+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_2 	 2013-01-16T15:41:00+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Checking NTP server on 10.6.2.171                                                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_3 	 2013-01-17T15:40:56+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_4 	 2013-02-21T15:48:57+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Checking NTP server on 10.187.4.86                                                               : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_5 	 2013-02-22T15:48:58+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_6 	 2013-05-23T15:49:12+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Checking NTP server on 10.187.4.86                                                               : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_7 	 2013-05-24T15:49:13+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_8 	 2013-08-26T15:51:01+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Checking DNS server on 10.187.0.206                                                              : FAILED DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_9 	 2013-08-27T15:50:46+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_10	 2014-03-08T15:55:33+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Checking NTP server on 10.6.2.171                                                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_11	 2014-03-11T15:55:29+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_12	 2014-03-30T15:57:11+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Checking DNS server on 10.187.0.206                                                              : FAILED DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
6_13	 2014-03-31T15:56:56+08:00	 "Cell configuration check discovered the following problems:   Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf DNS server 10.187.0.206 exists only in Exadata configuration file                                : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."
8_1 	 2012-12-03T12:38:31+08:00	 "File system "/" is 80% full, which is above the 80% threshold. Accelerated space reclamation has started.  This alert will be cleared when file system "/" becomes less than 75% full. Top three directories ordered by total space usage are as follows: /opt        : 3.68G /usr        : 2.4G /root        : 1.55G"
60_1	 2013-12-06T18:39:58+08:00	 "Data hard disk entered predictive failure status.  Status        : WARNING - PREDICTIVE FAILURE  Manufacturer  : SEAGATE  Model Number  : ST32000SSSUN2.0T  Size          : 2.0TB  Serial Number : 1108L45RD9  Firmware      : 061A  Slot Number   : 2  Cell Disk     : CD_02_dm01cel01  Grid Disk     : RECO_DM01_CD_02_dm01cel01, DATA_DM01_CD_02_dm01cel01, DBFS_DG_CD_02_dm01cel01"
60_2	 2013-12-06T19:13:08+08:00	 "Data hard disk failed.  Status        : CRITICAL  Manufacturer  : SEAGATE  Model Number  : ST32000SSSUN2.0T  Size          : 2.0T  Serial Number : 1108L45RD9  Firmware      : 061A  Slot Number   : 2  Cell Disk     : CD_02_dm01cel01  Grid Disk     : RECO_DM01_CD_02_dm01cel01, DATA_DM01_CD_02_dm01cel01, DBFS_DG_CD_02_dm01cel01"
...More
Top

Top

Verify Exadata Smart Flash Log is Created

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

When created, Exadata Smart Flash Log uses 512MB of flash memory per storage server by default to help minimize redo log write latency. 

The impact of verifying that Exadata Smart Flash Log is created is minimal.

Risk:

Without Exadata Smart Flash Log, the LGWR process may be delayed causing longer "log file parallel write" and "log file sync" waits.

Action / Repair:

To verify that Exadata Smart Flash Log is created, execute the following cellcli command as the "celladmin" user on each storage server:

list flashlog attributes size,status

The output should be similar to:
    512M    normal

If the output is not as shown, Exadata Smart Flash Log may not be created, or there may be a hardware issue, or there may be a configuration issue. Investigate and correct the condition.
Because they share the same storage server physical flash memory, there is a space usage relationship between Exadata Smart Flash Log and Exadata Smart Flash Cache. Exadata Smart Flash Log should be created before Exadata Smart Flash Cache, because the default configuration for Exadata Smart Flash Cache will use all available storage server flash memory. If Exadata Smart Flash Cache already exists, a subsequent attempt to create Exadata Smart Flash Log will fail because all the available storage server flash memory is in use.
To create Exadata Smart Flash Log when Exadata Smart Flash Cache is not created, execute the following cellcli command as the "celladmin" user:
create flashlog all

To create Exadata Smart Flash Log when Exadata Smart Flash Cache is already enabled, at the default sizings for both, enter the following "cellcli" commands as the "celladmin" user:

drop flashcache 
create flashlog all 
create flashcache all

NOTE: Exadata Smart Flash Log is created by default with Exadata Storage Server Software version 11.2.2.4.0 and above.

NOTE: Exadata Smart Flash Log will be used by Oracle software 11.2.0.2 Bundle Patch 9 (or higher) or 11.2.0.3.0. The recommended Oracle software version levels are 11.2.0.2 Bundle Patch 11 (or higher) or 11.2.0.3 Bundle Patch 1 (or higher).

NOTE: The default Exadata Smart Flash Log size of 512MB is the recommended value.

NOTE: See also "Configure Storage Server Flash Memory as Exadata Smart Flash Cache"
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Smart flash log is created on all storage server


DATA FROM DM01CEL01 FOR VERIFY EXADATA SMART FLASH LOG IS CREATED



name:              	 dm01cel01_FLASHLOG
cellDisk:          	 FD_08_dm01cel01,FD_06_dm01cel01,FD_09_dm01cel01,FD_12_dm01cel01,FD_02_dm01cel01,FD_01_dm01cel01,FD_04_dm01cel01,FD_14_dm01cel01,FD_05_dm01cel01,FD_13_dm01cel01,FD_10_dm01cel01,FD_00_dm01cel01,FD_07_dm01cel01,FD_15_dm01cel01,FD_11_dm01cel01,FD_03_dm01cel01
creationTime:      	 2012-11-28T10:31:16+08:00
degradedCelldisks:
effectiveSize:     	 512M
efficiency:        	 100.0
id:                	 463e749c-cdbf-469d-abad-283c3eb39f0b
size:              	 512M
status:            	 normal




DATA FROM DM01CEL02 FOR VERIFY EXADATA SMART FLASH LOG IS CREATED


...More
Top

Top

Verify Exadata Smart Flash Cache is created

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

For the vast majority of situations, maximum performance is achieved by configuring the storage server flash memory as cache, allowing the Exadata software to determine the content of the cache.
The impact of configuring storage server flash memory as cache at initial deployment is minimal. If there are already grid disks configured in the flash memory, consideration must be given as to the relocation of the data when converting the flash memory back to cache.

Risk:

Not configuring the storage server flash memory as cache may result in a degradation of overall performance.

Action / Repair:

To confirm all storage server flash memory is configured as smart flash cache, execute the command shown below:
cellcli -e "list flashcache detail" | grep size
The output will be similar to:
          size:                   364.75G
Starting with Exadata software version 11.2.2.4, for an environment deployed according to Oracle standards, with the storage server "flashlog" feature in use at the default size of 512M, the size of the storage server "flashcache" is expected to be 364.75G. If the size is less than that, some of the storage server flash memory may be configured as grid disks, or there may be a hardware issue, or there may be a configuration issue. Investigate and correct the condition.

To create Exadata Smart Flash Cache, execute the following cellcli command as the "celladmin" user:
create flashcache all 
 
NOTE: For Exadata Software version 11.2.2.3 or lower, or Exadata Software version 11.2.2.4.0 or higher where a decision was made not to create Exadata Smart Flash Log, the size of the storage server "flashcache" is expected to be 365.25G.

NOTE: See also "Verify Exadata Smart Flash Log is Created".
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Storage Server Flash Memory is configured as Exadata Smart Flash Cache


DATA FROM DM01CEL01 FOR VERIFY EXADATA SMART FLASH CACHE IS CREATED




NOTE: Look for size or size:

name:              	 dm01cel01_FLASHCACHE
cellDisk:          	 FD_07_dm01cel01,FD_13_dm01cel01,FD_09_dm01cel01,FD_05_dm01cel01,FD_10_dm01cel01,FD_00_dm01cel01,FD_04_dm01cel01,FD_14_dm01cel01,FD_11_dm01cel01,FD_08_dm01cel01,FD_15_dm01cel01,FD_06_dm01cel01,FD_02_dm01cel01,FD_01_dm01cel01,FD_03_dm01cel01,FD_12_dm01cel01
creationTime:      	 2012-11-28T10:31:41+08:00
degradedCelldisks:
effectiveCacheSize:	 364.75G
id:                	 71c5504c-606e-4142-a5ef-e195dca88900
size:              	 364.75G
status:            	 normal




DATA FROM DM01CEL02 FOR VERIFY EXADATA SMART FLASH CACHE IS CREATED
...More
Top

Top

Verify PCI bridge is configured for generation II on storage servers

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The storage server PCI bridges (19:0.0 and 27:0.0) should be configured for
generation II for maximum performance.

There is minimal impact to verify the PCI Bridges configuration.

Risk:

If the PCI bridges are not configured for generation II, performance will be
sub-optimal.

Action / Repair:

To verify the current PCI bridges configuration, execute the following
command as the root userid on all storage servers:

for BUS_NUM in 19:0.0 27:0.0; do echo $BUS_NUM `lspci -xxx -s $BUS_NUM | grep
^50 | cut -d" " -f4`; done

The output should be similar to:

19:0.0 82
27:0.0 82

If any of the storage server PCI bridges do not return "82", there are three
possible corrective actions:

If the value returned is "81" you may upgrade to Exadata storage server
software version 11.2.2.4.0 or greater, or refer to MOS note1351559.1.

If neither the value "81" nor "82" is returned, contact oracle support for
further assistance.

    NOTE: PCI Bridge generation I will return the value "81".
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Peripheral component interconnect (PCI) bridge is configured for generation II on all storage servers


DATA FROM DM01CEL01 FOR VERIFY PCI BRIDGE IS CONFIGURED FOR GENERATION II ON STORAGE SERVERS



19:0.0 82
27:0.0 82




DATA FROM DM01CEL02 FOR VERIFY PCI BRIDGE IS CONFIGURED FOR GENERATION II ON STORAGE SERVERS



19:0.0 82
27:0.0 82




...More
Top

Top

Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 

When using multiple InfiniBand interfaces for the private cluster interconnect, there are specific ARP settings required for Real Application Clusters (RAC) to work correctly.

The impact of verifying the ARP configuration is minimal. Correcting a configuration requires editing "/etc/sysctl.conf" and restarting the interface(s).

Risk:

Incorrect ARP configurations for multiple InfiniBand interfaces used as the private cluster interconnect may prevent RAC from starting, or result in dropped packets and inconsistent RAC operation.

Action / Repair:

To verify the InfiniBand interface ARP settings for a database server, use the following command as the "root" userid:
grep -E "rp_filter|accept_local" /etc/sysctl.conf | grep -v max
For an X2-8, the output will be similar to:
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.bondib0.rp_filter = 0
net.ipv4.conf.bondib1.rp_filter = 0
net.ipv4.conf.bondib2.rp_filter = 0
net.ipv4.conf.bondib3.rp_filter = 0
net.ipv4.conf.all.accept_local = 1
net.ipv4.conf.bondib0.accept_local = 1
net.ipv4.conf.bondib1.accept_local = 1
net.ipv4.conf.bondib2.accept_local = 1
net.ipv4.conf.bondib3.accept_local = 1
For an X2-2(4170) or X2-2, the output will be similar to:
net.ipv4.conf.default.rp_filter = 1
If the output is not as shown, edit the "/etc/sysctl.conf" file and restart the relevant interface(s).
NOTE: A customer may have additonal settings in the "/etc/sysctl.conf" file depending upon their specific requirements.

NOTE: These settings do not contradict MOS note 1286796.1. The note uses two active IB interfaces in a non-bonded configuration in it's example. An X2-2(4170) or X2-2 uses the active/passive bonded "bondib0" interface as the private cluster interconnect, which does not qualify as "multiple private interconnects", so the ARP configuration is left at the defaults. For an X2-8, rp_filter is turned off because multiple active/passive bonded InfiniBand interfaces are used for the private cluster interconnect and it is possible to receive packets on one interface that will be replied to from another interface.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Address Resolution Protocol (ARP) is configured properly on database server.


DATA FROM DM01DB01 FOR VERIFY INFINIBAND ADDRESS RESOLUTION PROTOCOL (ARP) CONFIGURATION ON DATABASE SERVERS



net.ipv4.conf.default.rp_filter = 1

Status on dm01db02: PASS => Address Resolution Protocol (ARP) is configured properly on database server.


DATA FROM DM01DB02 FOR VERIFY INFINIBAND ADDRESS RESOLUTION PROTOCOL (ARP) CONFIGURATION ON DATABASE SERVERS



net.ipv4.conf.default.rp_filter = 1
Top

Top

Database init parameter DB_BLOCK_CHECKING

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

Intially db_block_checking is set to off due to potential performance impact. Performance testing is particularly important given that overhead is incurred on every block change. Block checking typically causes 1% to 10% overhead, but for update and insert intensive applications (such as Redo Apply at a standby database) the overhead can be much higher. OLTP compressed tables also require additional checks that can result in higher overhead depending on the frequency of updates to those tables. Workload specific testing is required to assess whether the performance overhead is acceptable.


Risk:

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair:

Based on performance testing results set the primary or standby database to either medium or full depending on the impact. If performance concerns prevent setting DB_BLOCK_CHECKING to either FULL or MEDIUM at a primary database, then it becomes even more important to enable this at the standby database. This protects the standby database from logical corruption that would be undetected at the primary database.
For higher data corruption detection and prevention, enable this setting but performance impacts vary per workload.Evaluate performance impact.

 
Links
Needs attention ondm01db01, dm01db02
Passed on-

Status on dm01db01: WARNING => Database parameter DB_BLOCK_CHECKING is NOT set to the recommended value.


DATA FROM DM01DB01 - MCSDB DATABASE - DATABASE INIT PARAMETER DB_BLOCK_CHECKING



DB_BLOCK_CHECKING = FALSE

Status on dm01db02: WARNING => Database parameter DB_BLOCK_CHECKING is NOT set to the recommended value.


DB_BLOCK_CHECKING = FALSE
Top

Top

Verify there are no griddisks configured on flash memory devices

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.

Risk:

If the storage server griddisk configuration is not verified, poor performance or unexpected outages may occur.

Action / Repair:

To verify there are no storage server griddisks configured on flash memory devices, execute the following command as the "celladmin" user on each storage server:
cellcli -e "list griddisk where disktype=flashdisk" | wc -l

The output should be:0

If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.
Experience has shown that the Oracle recommended Best Practice of using all available flash device space for Smart Flash Log and Smart Flash Cache provides the highest overall performance benefit with lowest maintenance overhead for an Oracle Exadata Database Machine. 

In some very rare cases for certain highly write-intensive applications, there may be some performance benefit to configuring grid disks onto the flash devices for datafile writes only. Flash grid disks should never be configured for redo which is addressed with the Smart Flash Log feature implemented in release 11.2.2.4 and higher.

The space available to Smart Flash Cache and Smart Flash Log is reduced by the amount of space allocated to the grid disks deployed on flash devices. The usable space in the flash grid disk group is either half or one-third of the space allocated for grid disks on flash devices, depending on whether the flash grid disks are configured with ASM normal or high redundancy.

If after thorough performance and recovery testing, a customer chooses to deploy grid disks on flash devices, it would be a supported, but not Best Practice, configuration.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => There are no griddisks configured on flash memory devices


DATA FROM DM01CEL01 FOR VERIFY THERE ARE NO GRIDDISKS CONFIGURED ON FLASH MEMORY DEVICES



name:              	 DATA_DM01_CD_00_dm01cel01
asmDiskgroupName:  	 DATA_DM01
asmDiskName:       	 DATA_DM01_CD_00_DM01CEL01
asmFailGroupName:  	 DM01CEL01
availableTo:
cachingPolicy:     	 default
cellDisk:          	 CD_00_dm01cel01
comment:
creationTime:      	 2012-11-28T10:33:54+08:00
diskType:          	 HardDisk
errorCount:        	 0
id:                	 01f726cc-700b-485a-9eaa-694eb0849b2a
offset:            	 32M
size:              	 1562G
status:            	 active

...More
Top

Top

Verify ASM griddisk,diskgroup and Failure group mapping

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

To detect and repair improper configuration on a system

Risk:

If a single storage server is improperly configured to contain grid disks for a disk group that reside in more than one failure group, then it is possible for the disk group to be taken offline, causing the database to crash, if that single storage server is taken offline (e.g. for planned maintenance) or fails.

Action / Repair:

Please download the checkDiskFGMapping.sh script from MOS note 1351036.1 referenced below , copy it to the database server(s) on which exachk will be executed at /opt/oracle.cellos/ location and run checkDiskFGMapping.sh manually to make sure that all grid disks are mapped to the correct ASM diskgroup and failure group.

Subsequently with checkDiskFGMapping.sh on the database server(s) where exachk will be executed, exachk will call checkDiskFGMapping.sh to check the mapping so that it doesn't need to be run manually again.

Please follow the instructions in MOS note 1351036.1 for executing checkDiskFGMapping.sh and how to interpret its output.

NOTE: exachk will not run the script in repair mode, only check mode.  It is up to the customer to schedule the repair, if needed, during a maintenance window as part of a standard change control process.
 
Links
Needs attention ondm01db01
Passed on-
Top

Top

Verify storage server metric CD_IO_ST_RQ

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 One or more storage servers have been found that may have a conventional or flash disk with a performance problem.  Please contact Oracle Support for further diagnostic assistance.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => No Storage Server conventional or flash disks have a performance problem


DATA FROM DM01CEL01 FOR VERIFY STORAGE SERVER METRIC CD_IO_ST_RQ









DATA FROM DM01CEL02 FOR VERIFY STORAGE SERVER METRIC CD_IO_ST_RQ









...More
Top

Top

Verify Platform Configuration and Initialization Parameters for Consolidation

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 when more than one RDBMS instances are running on database server, Please review the following MOS Note for consolidation best practices.

 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Only one non-ASM instance discovered


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY PLATFORM CONFIGURATION AND INITIALIZATION PARAMETERS FOR CONSOLIDATION



oracle   26335     1  0 Feb08 ?        00:47:46 ora_pmon_MCSDB1

Status on dm01db02: PASS => Only one non-ASM instance discovered


oracle    9353     1  0 Feb08 ?        00:48:45 ora_pmon_MCSDB2
Top

Top

High Redundancy Redolog files

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. 

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. 

Action / Repair: 

Ensure the db_create_online_log_dest_n is configured for a high redundancy diskgroup

A high redundancy diskgroup optimizes availability.

If a high redundancy disk group is available, use the first high ASM redundancy disk group for all your Online Redo Logs or Standby Redo Logs. Use only one log member to minimize performance impact.

If a high redundancy disk group is not available, multiplex redo log members across DATA and RECO ASM disk groups for additional protection.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database parameter Db_create_online_log_dest_n is set to recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - HIGH REDUNDANCY REDOLOG FILES



High redundancy disk groups = 	  1
Number of redo log groups with more than 1 member = 	     0
Number of diskgroup where redo log members are multiplexed = 		       1

Status on dm01db02: PASS => Database parameter Db_create_online_log_dest_n is set to recommended value


High redundancy disk groups = 	  1
Number of redo log groups with more than 1 member = 	     0
Number of diskgroup where redo log members are multiplexed = 		       1
Top

Top

log_archive_dest_n

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. 

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. 

Action / Repair: 

Ensure if log_archive_dest_n location attribute specification is set to USE_DB_RECOVERY_FILE_DEST similar to 

*.log_archive_dest_1= 'LOCATION=USE_DB_RECOVERY_FILE_DEST'

Do NOT set to a specific diskgroup since fast recovery area auto space management is ignored unless "USE_DB_FILE_RECOVERY_DEST" is explicitly used. This is not  the same as setting it to the equivalent diskgroup name from db_recovery_file_dest parameter
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database parameters log_archive_dest_n with Location attribute are all set to recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - LOG_ARCHIVE_DEST_N




Status on dm01db02: PASS => Database parameters log_archive_dest_n with Location attribute are all set to recommended value

Top

Top

compatible

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. 

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. 

Action / Repair: 

Set database parameter COMPATIBLE to current RDBMS version in use out to the fourth digit (ex: 11.2.0.2).
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database parameter COMPATIBLE is set to recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - COMPATIBLE



instance_vesion = 11.2.0.3.0 and compatible = 11.2.0.3.0

Status on dm01db02: PASS => Database parameter COMPATIBLE is set to recommended value


instance_vesion = 11.2.0.3.0 and compatible = 11.2.0.3.0
Top

Top

Verify InfiniBand Cable Connection Quality on storage servers

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:
InfiniBand cables require proper connections for optimal efficiency. Verifying the InfiniBand cable connection quality helps to ensure that the InfiniBand network operates at optimal efficiency.
There is minimal impact to verify InfiniBand cable connection quality.
Risk:
InfiniBand cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / Repair:
Execute the following command on all database and storage servers:
for ib_cable in `ls /sys/class/net | grep ^ib`; do printf "$ib_cable: "; cat /sys/class/net/$ib_cable/carrier; done 
The output should look similar to:
ib0: 1 
ib1: 1 
If anything other than "1" is reported, investigate that cable connection.
NOTE: Storage servers should report 2 connections. X2-2(4170) and X2-2 database servers should report 2 connections. X2-8 database servers should report 8 connections.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => All InfiniBand network cables are connected on all Storage Servers


DATA FROM DM01CEL01 FOR VERIFY INFINIBAND CABLE CONNECTION QUALITY ON STORAGE SERVERS



/sys/class/net/ib0/carrier = 1
/sys/class/net/ib1/carrier = 1




DATA FROM DM01CEL02 FOR VERIFY INFINIBAND CABLE CONNECTION QUALITY ON STORAGE SERVERS



/sys/class/net/ib0/carrier = 1
/sys/class/net/ib1/carrier = 1




...More
Top

Top

Verify Ethernet Cable Connection Quality

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:
Ethernet cables require proper connections for optimal efficiency. Verifying the Ethernet cable connection quality helps to ensure that the Ethernet network operates at optimal efficiency.
There is minimal impact to verify Ethernet cable connection quality.
Risk:
Ethernet cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / Repair:
Execute the following command as the root userid on all database and storage servers:
for cable in `ls /sys/class/net | grep ^eth`; do  printf "$cable: "; cat /sys/class/net/$cable/carrier; done 
The output should look similar to:
eth0: 1
eth1: cat: /sys/class/net/eth1/carrier: Invalid argument
eth2: cat: /sys/class/net/eth2/carrier: Invalid argument
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 1
eth5: 1 
"Invalid argument" usually indicates the device has not been configured and is not in use. If a device reports "0", investigate that cable connection.
NOTE: Within machine types, the output of this command will vary by customer depending on how the customer chooses to configure the available ethernet cards.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => All Ethernet network cables are connected


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY ETHERNET CABLE CONNECTION QUALITY



/sys/class/net/eth0/carrier = 1
/sys/class/net/eth1/carrier = 1
/sys/class/net/eth2/carrier = 1
/sys/class/net/eth3/carrier =
/sys/class/net/eth4/carrier =
/sys/class/net/eth5/carrier =

Status on dm01db02: PASS => All Ethernet network cables are connected


/sys/class/net/eth0/carrier = 1
/sys/class/net/eth1/carrier = 1
/sys/class/net/eth2/carrier = 1
/sys/class/net/eth3/carrier =
/sys/class/net/eth4/carrier =
/sys/class/net/eth5/carrier =
Top

Top

Verify Ethernet Cable Connection Quality on storage servers

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:
Ethernet cables require proper connections for optimal efficiency. Verifying the Ethernet cable connection quality helps to ensure that the Ethernet network operates at optimal efficiency.
There is minimal impact to verify Ethernet cable connection quality.
Risk:
Ethernet cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / Repair:
Execute the following command as the root userid on all database and storage servers:
for cable in `ls /sys/class/net | grep ^eth`; do  printf "$cable: "; cat /sys/class/net/$cable/carrier; done 
The output should look similar to:
eth0: 1
eth1: cat: /sys/class/net/eth1/carrier: Invalid argument
eth2: cat: /sys/class/net/eth2/carrier: Invalid argument
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 1
eth5: 1 
"Invalid argument" usually indicates the device has not been configured and is not in use. If a device reports "0", investigate that cable connection.
NOTE: Within machine types, the output of this command will vary by customer depending on how the customer chooses to configure the available ethernet cards.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => All Ethernet network cables are connected on all Storage Servers


DATA FROM DM01CEL01 FOR VERIFY ETHERNET CABLE CONNECTION QUALITY ON STORAGE SERVERS



/sys/class/net/eth0/carrier = 1
/sys/class/net/eth1/carrier =
/sys/class/net/eth2/carrier =
/sys/class/net/eth3/carrier =




DATA FROM DM01CEL02 FOR VERIFY ETHERNET CABLE CONNECTION QUALITY ON STORAGE SERVERS



/sys/class/net/eth0/carrier = 1
/sys/class/net/eth1/carrier =
/sys/class/net/eth2/carrier =
/sys/class/net/eth3/carrier =
...More
Top

Top

Verify InfiniBand Cable Connection Quality

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:
InfiniBand cables require proper connections for optimal efficiency. Verifying the InfiniBand cable connection quality helps to ensure that the InfiniBand network operates at optimal efficiency.
There is minimal impact to verify InfiniBand cable connection quality.
Risk:
InfiniBand cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / Repair:
Execute the following command on all database and storage servers:
for ib_cable in `ls /sys/class/net | grep ^ib`; do printf "$ib_cable: "; cat /sys/class/net/$ib_cable/carrier; done 
The output should look similar to:
ib0: 1 
ib1: 1 
If anything other than "1" is reported, investigate that cable connection.
NOTE: Storage servers should report 2 connections. X2-2(4170) and X2-2 database servers should report 2 connections. X2-8 database servers should report 8 connections.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => All InfiniBand network cables are connected


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY INFINIBAND CABLE CONNECTION QUALITY



/sys/class/net/ib0/carrier = 1
/sys/class/net/ib1/carrier = 1

Status on dm01db02: PASS => All InfiniBand network cables are connected


/sys/class/net/ib0/carrier = 1
/sys/class/net/ib1/carrier = 1
Top

Top

db_recovery_file_dest_size

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. 

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. 

Action / Repair: 

Ensure db_recovery_file_dest_size <= 90% of the Recovery Area diskgroup TOTAL_MB size

 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database parameter db_recovery_file_dest_size is set to recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - DB_RECOVERY_FILE_DEST_SIZE



90% of RECO_DM01 Total Space = 			 8522GB
db_recovery_file_dest_size= 			 2048GB

Status on dm01db02: PASS => Database parameter db_recovery_file_dest_size is set to recommended value


90% of RECO_DM01 Total Space = 			 8522GB
db_recovery_file_dest_size= 			 2048GB
Top

Top

Recovery and Create File Destinations

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk:

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair:

In order to maximize recoverability and availability ensure that DB_CREATE_FILE_DEST and DB_RECOVERY_FILE_DEST are located in separate ASM diskgroups

 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database DB_CREATE_FILE_DEST and DB_RECOVERY_FILE_DEST are in different diskgroups


DATA FROM DM01DB01 - MCSDB DATABASE - RECOVERY AND CREATE FILE DESTINATIONS



db_recovery_file_dest = +RECO_DM01
db_create_file_dest = NONE SPECIFIED

Status on dm01db02: PASS => Database DB_CREATE_FILE_DEST and DB_RECOVERY_FILE_DEST are in different diskgroups


db_recovery_file_dest = +RECO_DM01
db_create_file_dest = NONE SPECIFIED
Top

Top

cluster_interconnects

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. 

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair:

Database parameter CLUSTER_INTERCONNECTS should be a colon delimited string of the IP addresses returned from sbin/ifconfig for each cluster_interconnect interface returned by oifcfg.  In the case of X2-2 it is expected that there would only be one interface and therefore one IP address.

This is used to avoid the Clusterware HAIP address; For an X2-8, the 4 IP addresses should be colon delimited
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database parameter CLUSTER_INTERCONNECTS is set to the recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - CLUSTER_INTERCONNECTS




Cluster Interconnect Value from instance = 192.168.10.1

Network card name and IP from oifcfg
bondib0 = 192.168.10.1

Status on dm01db02: PASS => Database parameter CLUSTER_INTERCONNECTS is set to the recommended value



Cluster Interconnect Value from instance = 192.168.10.2

Network card name and IP from oifcfg
bondib0 = 192.168.10.2
Top

Top

cluster_interconnects

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain ASM initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these ASM initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to the ASM instances. Unless otherwise specified, the value is for both X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal. 

Risk: 

If the ASM initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair:

ASM parameter CLUSTER_INTERCONNECTS should be a colon delimited string of the IP addresses returned from sbin/ifconfig for each cluster_interconnect interface returned by oifcfg.  In the case of X2-2 it is expected that there would only be one interface and therefore one IP address.

This is used to avoid the Clusterware HAIP address; For an X2-8, the 4 IP addresses should be colon delimited
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ASM parameter CLUSTER_INTERCONNECTS is set to the recommended value


DATA FROM DM01DB01 - MCSDB DATABASE - CLUSTER_INTERCONNECTS




Cluster Interconnect Value from instance = 192.168.10.1

Network card name and IP from oifcfg
bondib0 = 192.168.10.1

Status on dm01db02: PASS => ASM parameter CLUSTER_INTERCONNECTS is set to the recommended value



Cluster Interconnect Value from instance = 192.168.10.2

Network card name and IP from oifcfg
bondib0 = 192.168.10.2
Top

Top

Check for parameter parallel_execution_message_size

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

PARALLEL_EXECUTION_MESSAGE_SIZE = 16384 Improves Parallel Query performance
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter PARALLEL_EXECUTION_MESSAGE_SIZE is set to recommended value

MCSDB1.parallel_execution_message_size = 16384                                  

Status on MCSDB2: PASS => Database parameter PARALLEL_EXECUTION_MESSAGE_SIZE is set to recommended value

MCSDB2.parallel_execution_message_size = 16384                                  
Top

Top

Check for parameter sql92_security

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

SQL92_SECURITY = TRUE is a security optimization
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter SQL92_SECURITY is set to recommended value

MCSDB1.sql92_security = TRUE                                                    

Status on MCSDB2: PASS => Database parameter SQL92_SECURITY is set to recommended value

MCSDB2.sql92_security = TRUE                                                    
Top

Top

Check for parameter use_large_pages

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Memory savings and reduce paging and swapping.

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

USE_LARGE_PAGES = ONLY ensures the entire SGA is stored in hugepage for Linux based systems only.

Prequisites: Operating system hugepages setting need to be correctly configured and need to be adjusted when another instance is added or dropped or whenever sga sizes change.  See referenced MOS Notes to configure HugePages.
 
Links
Needs attention onMCSDB1, MCSDB2
Passed on-

Status on MCSDB1: FAIL => Database parameter USE_LARGE_PAGES is NOT set to recommended value

MCSDB1.use_large_pages = TRUE                                                   

Status on MCSDB2: FAIL => Database parameter USE_LARGE_PAGES is NOT set to recommended value

MCSDB2.use_large_pages = TRUE                                                   
Top

Top

Check for parameter open_cursors

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

OPEN_CURSORS >= 300.  Initial deployment database uses 1000
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter OPEN_CURSORS is set to recommended value

MCSDB1.open_cursors = 3000                                                      

Status on MCSDB2: PASS => Database parameter OPEN_CURSORS is set to recommended value

MCSDB2.open_cursors = 3000                                                      
Top

Top

Check for parameter os_authent_prefix

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

OS_AUTHENT_PREFIX =  is a security optimization (ie., null string)
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter OS_AUTHENT_PREFIX is set to recommended value

MCSDB1.os_authent_prefix =                                                      

Status on MCSDB2: PASS => Database parameter OS_AUTHENT_PREFIX is set to recommended value

MCSDB2.os_authent_prefix =                                                      
Top

Top

Check for parameter parallel_threads_per_cpu

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

PARALLEL_THREADS_PER_CPU = 1 properly accounts for hyper threading
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter PARALLEL_THREADS_PER_CPU is set to recommended value

MCSDB1.parallel_threads_per_cpu = 1                                             

Status on MCSDB2: PASS => Database parameter PARALLEL_THREADS_PER_CPU is set to recommended value

MCSDB2.parallel_threads_per_cpu = 1                                             
Top

Top

Check for parameter _enable_NUMA_support

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

_ENABLE_NUMA_SUPPORT = FALSE 

Enable NUMA support on Oracle Database Machine X2-8 only
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter _ENABLE_NUMA_SUPPORT is set to recommended value

_enable_NUMA_support = FALSE                                                    

Status on MCSDB2: PASS => Database parameter _ENABLE_NUMA_SUPPORT is set to recommended value

_enable_NUMA_support = FALSE                                                    
Top

Top

Check for parameter parallel_adaptive_multi_user

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Performance impact: PQ degree will be reduced for some queries especially with concurrent 
workloads.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

set PARALLEL_ADAPTIVE_MULTI_USER to  FALSE.
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter PARALLEL_ADAPTIVE_MULTI_USER is set to recommended value

MCSDB1.parallel_adaptive_multi_user = FALSE                                     

Status on MCSDB2: PASS => Database parameter PARALLEL_ADAPTIVE_MULTI_USER is set to recommended value

MCSDB2.parallel_adaptive_multi_user = FALSE                                     
Top

Top

Check for parameter _file_size_increase_increment

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

_file_size_increase_increment = 2044M (2143289344 bytes) ensures adequately sized RMAN backup allocations
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter _file_size_increase_increment is set to the recommended value

_file_size_increase_increment = 2044M                                           

Status on MCSDB2: PASS => Database parameter _file_size_increase_increment is set to the recommended value

_file_size_increase_increment = 2044M                                           
Top

Top

Check for parameter global_names

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

GLOBAL_NAMES = TRUE is a security optimization
 
Needs attention onMCSDB1, MCSDB2
Passed on-

Status on MCSDB1: FAIL => Database parameter GLOBAL_NAMES is NOT set to recommended value

MCSDB1.global_names = FALSE                                                     

Status on MCSDB2: FAIL => Database parameter GLOBAL_NAMES is NOT set to recommended value

MCSDB2.global_names = FALSE                                                     
Top

Top

Check for parameter db_lost_write_protect

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

This is important for data block lost write detection and repair. Enable for
primary and standby databases.

Refer to MOS 1265884.1 and 1302539.1. Refer to section on how to address
ORA-752 on the standby database
 
Links
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter DB_LOST_WRITE_PROTECT is set to recommended value

MCSDB1.db_lost_write_protect = typical                                          

Status on MCSDB2: PASS => Database parameter DB_LOST_WRITE_PROTECT is set to recommended value

MCSDB2.db_lost_write_protect = typical                                          
Top

Top

Check for parameter log_buffer

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk:

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

Database LOG_BUFFER parameter value of at least 134217728 (128M) ensures adequate buffer space for new LGWR transport.
 
Needs attention on-
Passed onMCSDB1, MCSDB2

Status on MCSDB1: PASS => Database parameter LOG_BUFFER is set to recommended value

MCSDB1.log_buffer = 134217728                                                   

Status on MCSDB2: PASS => Database parameter LOG_BUFFER is set to recommended value

MCSDB2.log_buffer = 134217728                                                   
Top

Top

Verify Disk Cache Policy on database server

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

"Disk Cache Policy" is set to "Disabled" by default and should not be changed because the cache created by setting "Disk Cache Policy" to "Enabled" is not battery backed.
The impact of verifying that "Disk Cache Policy" is set to "Disabled" is minimal. The impact of suddenly losing power with "Disk Cache Policy" set to anything other than "Disabled" will vary according to each specific case, and cannot be estimated here.

Risk:

If the "Disk Cache Policy" is not "Disabled", there is a risk of data loss in the event of a sudden power loss because the cache created by "Disk Cache Policy" is not backed up by a battery.

Action / Repair:

To verify that "Disk Cache Policy" is set to "Disabled" on all servers, use the following command as the "root" userid on the first database server in the cluster:
dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL | grep -i 'Disk Cache Policy'

The output will be similar to:

randomdb01: Disk Cache Policy   : Disabled 
randomdb01: Disk Cache Policy   : Disabled 
randomdb01: Disk Cache Policy   : Disabled 
 
randomcel03: Disk Cache Policy   : Disabled 
randomcel03: Disk Cache Policy   : Disabled 
randomcel03: Disk Cache Policy   : Disabled 
randomcel03: Disk Cache Policy   : Disabled 
randomcel03: Disk Cache Policy   : Disabled

If any of the results are other than "Disabled", identify the LUN in question and reset the "Disk Cache Policy" to "Disabled" using the following command (where Lx= the lun in question, for example: L2):

MegaCli64 -LDSetProp -DisDskCache -Lx -a0

    Note: The "Disk Cache Policy" is completely separate from the disk controller caching mode of "WriteBack". Do not
    confuse the two. The cache created by "WriteBack" cache mode is battery-backed, the cache created by "Disk Cache Policy" is not! 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Disk cache policy is set to Disabled on database server


DATA FROM DM01DB01 FOR VERIFY DISK CACHE POLICY ON DATABASE SERVER



Disk Cache Policy   : Disabled
Slot Number: 1
Slot Number: 2
Slot Number: 3

Status on dm01db02: PASS => Disk cache policy is set to Disabled on database server


DATA FROM DM01DB02 FOR VERIFY DISK CACHE POLICY ON DATABASE SERVER



Disk Cache Policy   : Disabled
Slot Number: 1
Slot Number: 2
Slot Number: 3
Top

Top

Verify Disk Cache Policy on Storage Server

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

"Disk Cache Policy" is set to "Disabled" by default and should not be changed because the cache created by setting "Disk Cache Policy" to "Enabled" is not battery backed.
The impact of verifying that "Disk Cache Policy" is set to "Disabled" is minimal. The impact of suddenly losing power with "Disk Cache Policy" set to anything other than "Disabled" will vary according to each specific case, and cannot be estimated here.

Risk:

If the "Disk Cache Policy" is not "Disabled", there is a risk of data loss in the event of a sudden power loss because the cache created by "Disk Cache Policy" is not backed up by a battery.

Action / Repair:

To verify that "Disk Cache Policy" is set to "Disabled" on all servers, use the following command as the "root" userid on the first database server in the cluster:

Note : -The slot number is included in the output for assistance in identifying disk drives.

dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL |egrep -i 'Disk Cache Policy|slot number'
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Disk cache policy is set to Disabled on all storage server


DATA FROM DM01CEL01 FOR VERIFY DISK CACHE POLICY ON STORAGE SERVER



Disk Cache Policy   : Disabled
Slot Number: 0
Disk Cache Policy   : Disabled
Slot Number: 1
Disk Cache Policy   : Disabled
Slot Number: 3
Disk Cache Policy   : Disabled
Slot Number: 4
Disk Cache Policy   : Disabled
Slot Number: 5
Disk Cache Policy   : Disabled
Slot Number: 6
Disk Cache Policy   : Disabled
Slot Number: 7
Disk Cache Policy   : Disabled
Slot Number: 8
...More
Top

Top

Minimum exadata version required for ASR

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Minimum version is 11.2.1.3.1 for ASR support

Risk:

Automated SR's will not be filed for failures detected on the DBM

Action / Repair: 

Upgrade to 11.2.1.3.1 or higher to get this functionality
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Exadata software version supports Automatic Service Request functionality


DATA FROM DM01DB01 FOR MINIMUM EXADATA VERSION REQUIRED FOR ASR




Kernel version: 2.6.32-400.1.1.el5uek #1 SMP Mon Jun 25 20:25:08 EDT 2012 x86_64
Image version: 11.2.3.2.0.120713
Image activated: 2012-11-28 09:53:22 +0800
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1


Status on dm01db02: PASS => Exadata software version supports Automatic Service Request functionality


DATA FROM DM01DB02 FOR MINIMUM EXADATA VERSION REQUIRED FOR ASR




Kernel version: 2.6.32-400.1.1.el5uek #1 SMP Mon Jun 25 20:25:08 EDT 2012 x86_64
Image version: 11.2.3.2.0.120713
Image activated: 2012-11-28 09:53:21 +0800
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

Top

Top

Verify Electronic Storage Module (ESM) Lifetime is within Specification

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The Flash 20 card supports ESM lifetime to enable proactive replacement before failure.
The impact of verifying that the ESM lifetime is within specification is minimal. Replacing an ESM requires a scheduled outage.

Risk:

Failure of the ESM will put the Flash 20 card in WriteThrough mode which has a high impact on performance.

Action / Repair:

Top verify the ESM lifetime value, use the following command on the storage servers:
for RISER in RISER1/PCIE1 RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "show /SYS/MB/$RISER/F20CARD/UPTIME"; done | grep value -A4
The output will be similar to:
        value = 3382.350 Hours
        upper_nonrecov_threshold = 17500.000 Hours
        upper_critical_threshold = 17200.000 Hours
        upper_noncritical_threshold = 16800.000 Hours
        lower_noncritical_threshold = N/A
-- 
If the "value" reported exceeds the "upper_noncritical_threshold" reported, schedule a replacement of the relevant ESM.

NOTE: There is a bug in ILOM firmware version 3.0.9.19.a which may report "Invalid target..." for "RISER1/PCIE4". If that happens, consult your site records to verify the age the ESM Module.

NOTE: For Aura II (F20 M2) cards, the CPLD reports the End of Life indication on the F20 M2 cards, so the thresholds for UPTIME sensor are not needed. The threshold values are replaced with "N/A". The ILOM will fault the system when it's time to replace the F20 M2's ESM. Beginning with 2.1.3, exachk does not execute this check on F20 M2 cards. Beginning with 2.1.5, exachk posts a message in the html report detail that the card is an F20M2 model and the check is not applicable. 

 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Electronic Storage Module (ESM) Lifetime is within specification for all flash cards on all storage servers


DATA FROM DM01CEL01 FOR VERIFY ELECTRONIC STORAGE MODULE (ESM) LIFETIME IS WITHIN SPECIFICATION



/SYS/MB/RISER1/PCIE1/F20CARD is an F20M2 model and this esm lifetime check does not apply.

/SYS/MB/RISER1/PCIE4/F20CARD is an F20M2 model and this esm lifetime check does not apply.

/SYS/MB/RISER2/PCIE2/F20CARD is an F20M2 model and this esm lifetime check does not apply.

/SYS/MB/RISER2/PCIE5/F20CARD is an F20M2 model and this esm lifetime check does not apply.





DATA FROM DM01CEL02 FOR VERIFY ELECTRONIC STORAGE MODULE (ESM) LIFETIME IS WITHIN SPECIFICATION



...More
Top

Top

Data network is separate from management network on storage server

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical, 03/02/11
Benefit / Impact:
It is a requirement that the management network be on a different non-overlapping sub-net than the InfiniBand network and the client access network. This is necessary for better network security, better client access bandwidths, and for Auto Service Request (ASR) to work correctly.
The management network comprises of the eth0 network interface in the database and storage severs, the ILOM network interfaces of the database and storage servers, and the Ethernet management interfaces of the InfiniBand switches and PDUs.
Risk:
Having the management network on the same subnet as the client access network will reduce network security, potentially restrict the client access bandwidth to/from the Database Machine to a single 1GbE link, and will prevent ASR from working correctly.
Action / Repair:
To verify that the management network interface (eth0) is on a separate network from other network interfaces, execute the following command as the "root" userid on both storage and database servers:
grep -i network /etc/sysconfig/network-scripts/ifcfg* | cut -f5 -d"/" | grep -v "#"
 
The output will be similar to:
ifcfg-bondeth0:NETWORK=10.204.77.0
ifcfg-bondib0:NETWORK=192.168.76.0
ifcfg-eth0:NETWORK=10.204.78.0
ifcfg-lo:NETWORK=127.0.0.0
The expected result is that the network values are different. If they are not, investigate and correct the condition.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Management network is separate from data network on all storage servers


DATA FROM DM01CEL01 FOR DATA NETWORK IS SEPARATE FROM MANAGEMENT NETWORK ON STORAGE SERVER



ifcfg-bondib0:NETWORK=192.168.8.0
ifcfg-eth0:NETWORK=10.187.5.0




DATA FROM DM01CEL02 FOR DATA NETWORK IS SEPARATE FROM MANAGEMENT NETWORK ON STORAGE SERVER



ifcfg-bondib0:NETWORK=192.168.8.0
ifcfg-eth0:NETWORK=10.187.5.0




...More
Top

Top

Ambient Temperature

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical
Benefit / Impact:

Maintaining environmental temperature conditions within design specification for an Oracle Exadata Database Machine helps to achieve maximum efficiency and targeted component service lifetimes.

The impact of verifying the ambient temperature range is minimal. The impact of correcting reported ambient temperatures outside of design specification range will vary depending upon the cause of the environmental conditions.

Risk:

Temperatures outside the design specification range affect all components within the chassis of an Oracle Exadata Database Machine, possibly manifesting performance problems and shortened service lifetimes.

Action / Repair:

To verify the ambient temperature range of all servers, execute the following command as the root userid on the first database server in the cluster:

if [ `cat /opt/oracle.SupportTools/onecommand/cell_group | egrep "01$|02$|03$" | wc -l` = 0 ]
then
echo "SKIPPED: This cluster does not access the first three storage servers.  Ambient temperature check skipped.";
else
export SS_LIST=`awk '{OFS=","; ORS=""}{ print $0 "," }' /opt/oracle.SupportTools/onecommand/cell_group | cut -d"," -f 1-3`
export AVG_TEMP=`dcli -c $SS_LIST -l root 'ipmitool sunoem cli "show /SYS/T_AMB" | grep value' | cut -d" " -f 4 | awk '{ SUM += $1} END { printf "%.0f",SUM/3}'`;
if [ $AVG_TEMP -ge 5 -a $AVG_TEMP -le 32 ]
then
echo "SUCCESS: Average ambient temperature is within range of 5 to 32 degrees Centigrade:  $AVG_TEMP";
else
echo -e "FAILURE: Average ambient temperature is outside range of 5 to 32 degrees Centigrade:  $AVG_TEMP";
fi;
fi

The output should be similar to:

SUCCESS: Average ambient temperature is within range of 5 to 32 degrees Centigrade:  27

If the "SUCCESS" message is not returned, there are two possibilities:

1) The "SKIPPED" message indicates that the cluster in which this command was executed does not have access in cell_group to the first 3 storage servers in the physical rack. This could occur because the physical rack is logically or physically divided into multiple clusters. The corrective action is to execute the command in the cluster on the physical rack that has access to the first three storage servers in cell_group.

2) The "FAILURE" message indicates that the average ambient temperature of the first three storage servers in the physical rack was outside the permitted range of 5C to 32C. The corrective action is to investigate and correct the environmental conditions.

    NOTE: Since there is no one sensor in the physical rack for overall ambient temperature of the data center air, this check reads the ambient temperature from the first three storage servers in the physical rack (deployed according to Oracle standards) which are closest to the air flow from the raised floor tiles and averages the result. The averaging is in case there might be a sensor reading incorrectly. 
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Ambient temperature is within the recommended range.


DATA FROM DM01CEL01 FOR AMBIENT TEMPERATURE



value = 25.000 degree C




DATA FROM DM01CEL02 FOR AMBIENT TEMPERATURE



value = 26.000 degree C




DATA FROM DM01CEL03 FOR AMBIENT TEMPERATURE

...More
Top

Top

Verify Oracle ASM instance use RDS Protocol over InfiniBand Network.

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

RDS protocol provides superior performance for Oracle RAC communication because it does not require additional memory buffering operations when moving data from process memory to the network interface for RAC inter-node communication.

There is minimal impact to verify that RDS protocol is in use. If it is not, implementing it will require a relink of the Oracle software, which requires an outage.

Risk:

If the Oracle RAC databases do not use RDS protocol over the InfiniBand network, RAC inter-node communication will be sub-optimal.

Action / Repair:

Perform the following steps on the database servers:

   1. Verify an Oracle RAC database is using the RDS protocol over the InfiniBand Network by checking all alert logs on all nodes: 

Cluster communication is configured to use the following interface(s) for this instance 192.168.20.21 cluster interconnect IPC version:Oracle RDS/IP (generic) 

   1. If it is not running RDS, relink the Oracle binary via the following:

    * (as oracle) Shutdown any process using the Oracle binary
    * (as root) GRID_HOME/crs/install/rootcrs.pl -unlock **only required if relinking in the Grid Infrastructure home**
    * (as oracle) cd $ORACLE_HOME/rdbms/lib
    * (as oracle) make -f ins_rdbms.mk ipc_rds ioracle
    * (as root) GRID_HOME/crs/install/rootcrs.pl -patch only required if relinking in the Grid Infrastructure home 

    Note: Avoid using the relink all command due to various issues. Use the make commands as seen above when relinking oracle binaries.

    Note: The dcli utility can be used to double check that all nodes are configured consistently with respect to the skgxp libraries in use. An example command to perform this check is

    dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l oracle md5sum ${ORACLE_HOME}/lib/libskgxp11.so 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Oracle ASM Communication is using RDS protocol on Infiniband Network


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY ORACLE ASM INSTANCE USE RDS PROTOCOL OVER INFINIBAND NETWORK.



rds

Status on dm01db02: PASS => Oracle ASM Communication is using RDS protocol on Infiniband Network


rds
Top

Top

Verify database server disk controllers use writeback cache

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Database servers use an internal RAID controller with a battery-backed cache to host local filesystems. For maximum performance when writing I/O to local disks, the battery-backed cache should be in "WriteBack" mode.
The impact of configuring the battery-backed cache in "WriteBack" mode is minimal.

Risk:

Not configuring the battery-backed cache in "WriteBack" mode will result in degraded performance when writing I/O to the local database server disks.

Action / Repair:

To verify that the disk controller battery-backed cache is in "WriteBack" mode, run the following set of commands as the "root" userid on all database servers:
unset TMP_CMD;
unset TMP_RSLT;
if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]
then
#Linux
export TMP_CMD=/opt/MegaRAID/MegaCli/MegaCli64
else
#Solaris
export TMP_CMD=/opt/MegaRAID/MegaCli
fi;
TMP_RSLT=`$TMP_CMD -CfgDsply -a0 | grep -i writethrough | wc -l`;
if [ $TMP_RSLT = 0 ]
then
echo -e "\nSUCCESS\n"
else
echo -e "\nFAILURE:";
$TMP_CMD -CfgDsply -a0 | grep -i writethrough;
echo -e "\n";
fi;

The output should be:
SUCCESS

If the battery-backed cache is not in "WriteBack" mode, run these commands on the effected server to place the battery-backed cache into "WriteBack" mode:

if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]
then
#Linux
export TMP_CMD=/opt/MegaRAID/MegaCli/MegaCli64
else
#Solaris
export TMP_CMD=/opt/MegaRAID/MegaCli
fi
$TMP_CMD -LDSetProp WB  -Lall  -a0 
$TMP_CMD -LDSetProp NoCachedBadBBU -Lall  -a0 
$TMP_CMD -LDSetProp NORA -Lall  -a0 
$TMP_CMD -LDSetProp Direct -Lall  -a0

NOTE: No settings should be modified on Exadata storage cells. The mode described above applies only to database servers in an Exadata database machine.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database server disk controllers use writeback cache


DATA FROM DM01DB01 FOR VERIFY DATABASE SERVER DISK CONTROLLERS USE WRITEBACK CACHE



Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU

Status on dm01db02: PASS => Database server disk controllers use writeback cache


DATA FROM DM01DB02 FOR VERIFY DATABASE SERVER DISK CONTROLLERS USE WRITEBACK CACHE



Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Top

Top

Infiniband Switch counters on all switchs

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical
Benefit / Impact:
Verifying that there are no high, persistent InfiniBand network error counters helps to maintain the InfiniBand network at peak efficiency.
The impact of verifying there are no InfiniBand network errors is minimal.
Risk:
Without verifying the InfiniBand network error counters, there is a risk that a component will degrade the InfiniBand network performance, yet may not be sending an alert or error condition.
Action / Repair:
Use the command shown below on one of the database or storage servers:
# ibqueryerrors.pl -rR -s RcvSwRelayErrors,XmtDiscards,XmtWait
There should be no errors reported.
The InfiniBand counters are cumulative and the errors may have occurred at any time in the past. If there are errors, it is recommended to clear the InfiniBand counters with ibclearcounters, let the system run for a few minutes under load, and then re-execute the ibquerryerrors command. Any links reporting persistent errors (especially RcvErrors or SymbolErrors) may indicate a bad/loose cable or port.
Some counters (e.g RcvErrors, SymbolErrors) can increment when nodes are rebooted. Small values for these counters which are less than the "LinkDowned" counter are generally not a problem. The "LinkDowned" counter indicates the number of times the port has gone down (usually for valid reasons, e.g reboot) and is not usually an error indicator by itself.
If there are persistent, high InfiniBand network error counters, investigate and correct the condition.
 
Needs attention ondm01db01
Passed on-

Status on dm01db01: FAIL => InfiniBand network error counters are non-zero


DATA FROM DM01DB01 FOR INFINIBAND SWITCH COUNTERS ON ALL SWITCHS



Suppressing: RcvSwRelayErrors XmtDiscards XmtWait
Errors for 0x2128e8b08ea0a0 "SUN DCS 36P QDR dm01sw-ib3.mcsdb.com"
GUID 0x2128e8b08ea0a0 port 7: [RcvErrors == 1]
Link info:      2   7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x0021280001cf4886      8    2[  ] "SUN DCS 36P QDR dm01sw-ib3.mcsdb.com" ( )
Errors for 0x2128e8af6da0a0 "SUN DCS 36P QDR dm01sw-ib2.mcsdb.com"
GUID 0x2128e8af6da0a0 port 7: [RcvErrors == 1]
Link info:      1   7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x0021280001cf4886      7    1[  ] "SUN DCS 36P QDR dm01sw-ib2.mcsdb.com" ( )
GUID 0x2128e8af6da0a0 port 10: [RcvErrors == 1]
Link info:      1  10[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x0021280001cf4862      3    1[  ] "SUN DCS 36P QDR dm01sw-ib2.mcsdb.com" ( )
GUID 0x2128e8af6da0a0 port 13: [LinkDowned == 1]
Link info:      1  13[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128e8b08ea0a0      2   14[  ] "SUN DCS 36P QDR dm01sw-ib2.mcsdb.com" ( )
GUID 0x2128e8af6da0a0 port 14: [LinkDowned == 1]
Link info:      1  14[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128e8b08ea0a0      2   13[  ] "SUN DCS 36P QDR dm01sw-ib2.mcsdb.com" ( )
GUID 0x2128e8af6da0a0 port 15: [LinkDowned == 1]
Link info:      1  15[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128e8b08ea0a0      2   16[  ] "SUN DCS 36P QDR dm01sw-ib2.mcsdb.com" ( )
GUID 0x2128e8af6da0a0 port 16: [LinkDowned == 1]
...More
Top

Top

Verify InfiniBand Fabric Topology (verify-topology)

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical
Benefit / Impact:
Verifying that the InfiniBand network is configured with the correct topology for an Oracle Exadata Database Machine helps to ensure that the InfiniBand network operates at maximum efficiency.
Risk:
An incorrect InfiniBand topology will cause the InfiniBand network to operate at degraded efficiency, intermittently, or fail to operate.
Action / Repair:
Execute the verify-topology command as shown below:
# /opt/oracle.SupportTools/ibdiagtools/verify-topology -t fattree
        [ DB Machine InfiniBand Cabling Topology Verification Tool ]
Is every external switch connected to every internal switch..........[SUCCESS]
Are any external switches connected to each other....................[SUCCESS]
Are any hosts connected to spine switch..............................[SUCCESS]
Check if all hosts have 2 CAs to different switches..................[SUCCESS]
Leaf switch check: cardinality and even distribution.................[SUCCESS]
Check if each rack has an valid internal ring........................[SUCCESS]If anything other than "SUCCESS" is reported, investigate and correct the condition. 

 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => Verify-topology executes without any errors or warning


DATA FROM DM01DB01 FOR VERIFY INFINIBAND FABRIC TOPOLOGY (VERIFY-TOPOLOGY)




[ DB Machine Infiniband Cabling Topology Verification Tool ]
[Version IBD VER 2.c ]

--------------- Quarter Rack Exadata V2 Cabling Check---------

Check if all hosts have 2 CAs to different switches..................[SUCCESS]
Leaf switch check: cardinality and even distribution.................[SUCCESS]
Check if each rack has an valid internal ring........................[SUCCESS]

Top

Top

Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Database Server]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

The Oracle Exadata Database Machine is tightly integrated, and verifying the hardware and firmware before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the hardware or firmware modifications.

The impact for these verification steps is minimal.

Risk:

If the hardware and firmware are not validated, inconsistencies between database and storage servers can lead to problems and outages.

Action / Repair:

Verify the hardware and firmware configuration by executing the /opt/oracle.cellos/CheckHWnFWProfile script as the root userid as shown below:

# /opt/oracle.cellos/CheckHWnFWProfile
[SUCCESS] The hardware and firmware profile matches one of the supported profile

If any result other than "SUCCESS" is returned, investigate and correct the condition.

    NOTE: CheckHWnFWProfile is also executed at each boot of the storage and database servers. 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Hardware and firmware profile check is successful. [Database Server]


DATA FROM DM01DB01 FOR VERIFY HARDWARE AND FIRMWARE ON DATABASE AND STORAGE SERVERS (CHECKHWNFWPROFILE) [DATABASE SERVER]



[SUCCESS] The hardware and firmware profile matches one of the supported profiles

Status on dm01db02: PASS => Hardware and firmware profile check is successful. [Database Server]


DATA FROM DM01DB02 FOR VERIFY HARDWARE AND FIRMWARE ON DATABASE AND STORAGE SERVERS (CHECKHWNFWPROFILE) [DATABASE SERVER]



[SUCCESS] The hardware and firmware profile matches one of the supported profiles
Top

Top

Verify Software on Storage Servers (CheckSWProfile.sh)

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Verifying the software configuration after initial deployment, upgrades, or patching and before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the software modifications.

The overhead for these verification steps is minimal.

Risk:

If the software is not validated, inconsistencies between database and storage servers can lead to problems and outages.

Action / Repair:

Verify the storage server software configuration with the /opt/oracle.SupportTools/CheckSWProfile.sh script, as shown in the following example:

[root@node1 oracle.SupportTools]# ./CheckSWProfile.sh -c
[INFO] SUCCESS: Meets requirements of operating platform and installed software for
[INFO] below listed releases and patches of Exadata and of corresponding Database.
[INFO] Check does NOT verify correctness of configuration for installed software.
[The_ExadataAndDatabaseReleases]
Exadata: 11.2.2.1.0 OracleDatabase: 11.2.0.2+Patches
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Software profile check is successful on all storage servers.


DATA FROM DM01CEL01 FOR VERIFY SOFTWARE ON STORAGE SERVERS (CHECKSWPROFILE.SH)




[INFO] SUCCESS: Meets requirements of operating platform and InfiniBand software.
[INFO] Check does NOT verify correctness of configuration for installed software.





DATA FROM DM01CEL02 FOR VERIFY SOFTWARE ON STORAGE SERVERS (CHECKSWPROFILE.SH)




[INFO] SUCCESS: Meets requirements of operating platform and InfiniBand software.
[INFO] Check does NOT verify correctness of configuration for installed software.

...More
Top

Top

Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Storage Server]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

The Oracle Exadata Database Machine is tightly integrated, and verifying the hardware and firmware before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the hardware or firmware modifications.

The impact for these verification steps is minimal.

Risk:

If the hardware and firmware are not validated, inconsistencies between database and storage servers can lead to problems and outages.

Action / Repair:

Verify the hardware and firmware configuration by executing the /opt/oracle.cellos/CheckHWnFWProfile script as the root userid as shown below:

# /opt/oracle.cellos/CheckHWnFWProfile
[SUCCESS] The hardware and firmware profile matches one of the supported profile

If any result other than "SUCCESS" is returned, investigate and correct the condition.

    NOTE: CheckHWnFWProfile is also executed at each boot of the storage and database servers. 
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => Hardware and firmware profile check is successful on all storage servers.


DATA FROM DM01CEL01 FOR VERIFY HARDWARE AND FIRMWARE ON DATABASE AND STORAGE SERVERS (CHECKHWNFWPROFILE) [STORAGE SERVER]



[SUCCESS] The hardware and firmware profile matches one of the supported profiles




DATA FROM DM01CEL02 FOR VERIFY HARDWARE AND FIRMWARE ON DATABASE AND STORAGE SERVERS (CHECKHWNFWPROFILE) [STORAGE SERVER]



[SUCCESS] The hardware and firmware profile matches one of the supported profiles




DATA FROM DM01CEL03 FOR VERIFY HARDWARE AND FIRMWARE ON DATABASE AND STORAGE SERVERS (CHECKHWNFWPROFILE) [STORAGE SERVER]

...More
Top

Top

Local listener set to node VIP

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The LOCAL_LISTENER parameter should be set to the node VIP. If you need fully qualified domain names, ensure that LOCAL_LISTENER is set to the fully qualified domain name (node-vip.mycompany.com). By default a local listener is created during cluster configuration that runs out of the grid infrastructure home and listens on the specified port(default is 1521) of the node VIP.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Local listener init parameter is set to local node VIP


DATA FROM DM01DB01 - MCSDB DATABASE - LOCAL LISTENER SET TO NODE VIP



Local Listener= (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.187.4.92)(PORT=1521)))) VIP Names=dm0101-vip VIP IPs=10.187.4.92

Status on dm01db02: PASS => Local listener init parameter is set to local node VIP


Local Listener= (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.187.4.93)(PORT=1521)))) VIP Names=dm0102-vip VIP IPs=10.187.4.93
Top

Top

Voting disk status

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Stability, Availability

Risk:

Cluster instability

Action / Repair:

Voting disks that are not online would indicate a problem with the clusterware
and should be investigated as soon as possible.  All voting disks are expected to be ONLINE.

Use the following command to list the status of the voting disks

$CRS_HOME/bin/crsctl query css votedisk|sed 's/^ //g'|grep ^[0-9]

The output should look similar to the following, one row were voting disk, all disks should indicate ONLINE

1. ONLINE   192c8f030e5a4fb3bf77e43ad3b8479a (o/192.168.10.102/DBFS_DG_CD_02_sclcgcel01) [DBFS_DG]
2. ONLINE   2612d8a72d194fa4bf3ddff928351c41 (o/192.168.10.104/DBFS_DG_CD_02_sclcgcel03) [DBFS_DG]
3. ONLINE   1d3cceb9daeb4f0bbf23ee0218209f4c (o/192.168.10.103/DBFS_DG_CD_02_sclcgcel02) [DBFS_DG]
 
Needs attention ondm01db01
Passed on-

Status on dm01db01: WARNING => All voting disks are not online


DATA FROM DM01DB01 - MCSDB DATABASE - VOTING DISK STATUS



##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. OFFLINE  2a642ed3cc384febbf1f14ee69ab1523 (o/192.168.10.5/DBFS_DG_CD_02_dm01cel03) [DBFS_DG]
2. ONLINE   403c030ed70c4f91bf758a919f42e695 (o/192.168.10.4/DBFS_DG_CD_02_dm01cel02) [DBFS_DG]
3. ONLINE   5e4c17f237424fcfbf7edbd2cc0379ae (o/192.168.10.3/DBFS_DG_CD_03_dm01cel01) [DBFS_DG]
Located 3 voting disk(s).
Top

Top

Non-autoextensible data and temp files

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The benefit of having "AUTOEXTEND" on is that applications may avoid out of space errors.
The impact of verifying that the "AUTOEXTEND" attribute is "ON" is minimal. The impact of setting "AUTOEXTEND" to "ON" varies depending upon if it is done during database creation, file addition to a tablespace, or added to an existing file.

Risk

The risk of running out of space in either the tablespace or diskgroup varies by application and cannot be quantified here. A tablespace that runs out of space will interfere with an application, and a diskgroup running out of space could impact the entire database as well as ASM operations (e.g., rebalance operations).

Action / Repair:

To obtain a list of tablespaces that are not set to "AUTOEXTEND", enter the following sqlplus command logged into the database as sysdba:
select file_id, file_name, tablespace_name from dba_data_files where autoextensible <>'YES'
union
select file_id, file_name, tablespace_name from dba_temp_files where autoextensible <> 'YES'; 
The output should be:
no rows selected
If any rows are returned, investigate and correct the condition.
NOTE: Configuring "AUTOEXTEND" to "ON" requires comparing space utilization growth projections at the tablespace level to space available in the diskgroups to permit the expected projected growth while retaining sufficient storage space in reserve to account for ASM rebalance operations that occur either as a result of planned operations or component failure. The resulting growth targets are implemented with the "MAXSIZE" attribute that should always be used in conjunction with the "AUTOEXTEND" attribute. The "MAXSIZE" settings should allow for projected growth while minimizing the prospect of depleting a disk group. The "MAXSIZE" settings will vary by customer and a blanket recommendation cannot be given here.

NOTE: When configuring a file for "AUTOEXTEND" to "ON", the size specified for the "NEXT" attribute should cover all disks in the diskgroup to optimize balance. For example, with a 4MB AU size and 168 disks, the size of the "NEXT" attribute should be a multiple of 672M (4*168).
 
Needs attention onMCSDB
Passed on-

Status on MCSDB: FAIL => Some data or temp files are not autoextensible


DATA FOR MCSDB FOR NON-AUTOEXTENSIBLE DATA AND TEMP FILES




+DATA_DM01/mcsdb/datafile/dev1_oim.447.802540729
+DATA_DM01/mcsdb/datafile/dev5_apm.391.801068887
+DATA_DM01/mcsdb/datafile/dev5_brsadata.381.801068881
+DATA_DM01/mcsdb/datafile/dev5_brsaindx.369.801068877
+DATA_DM01/mcsdb/datafile/dev5_ias_iau.393.801068887
+DATA_DM01/mcsdb/datafile/dev5_ias_oif.417.801068897
+DATA_DM01/mcsdb/datafile/dev5_ias_orasdpm.416.801068897
+DATA_DM01/mcsdb/datafile/dev5_mds.419.801068897
+DATA_DM01/mcsdb/datafile/dev5_oam.418.801068897
+DATA_DM01/mcsdb/datafile/dev5_oim.413.801068895
+DATA_DM01/mcsdb/datafile/dev5_oim_lob.387.801068885
+DATA_DM01/mcsdb/datafile/dev5_soainfra.374.801068879
+DATA_DM01/mcsdb/datafile/dev5_tbs_oaam_data.384.801068883

+DATA_DM01/mcsdb/datafile/dev5_tbs_oaam_data_apr.371.801068877
...More
Top

Top

ohasd/orarootagent_root Log File Ownership

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Supportability, diagnostics

Risk:

If for any reason the ownership of certain clusterware related log files is changed incorrectly it could result in important diagnostics not being available when needed by Support. These logs are rotated periodically to keep them from growing unmanageably large and if the ownership of the files is incorrect when it is time to rotate the logs that operation could fail and while that doesn't effect the operation of the clusterware itself it would effect the logging and therefore problem diagnostics.


Action / Repair:

It is wise to verify that the ownership of the following files is root:root:

$ls -l $GRID_HOME/log/`hostname`/crsd/*
$ls -l $GRID_HOME/log/`hostname`/ohasd/*
$ls -l $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
$ls -l $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*

If any of those files' ownership is NOT root:root then you should change the ownership of the files individually as follows (as root):

# chown root:root $GRID_HOME/log/`hostname`/crsd/*
# chown root:root $GRID_HOME/log/`hostname`/ohasd/*
# chown root:root $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
# chown root:root $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ohasd/orarootagent_root Log Ownership is Correct (root root)


DATA FROM DM01DB01 - MCSDB DATABASE - OHASD/ORAROOTAGENT_ROOT LOG FILE OWNERSHIP



total 108408
-rw-r--r-- 1 root root 10594175 Apr 15 19:26 orarootagent_root.l01
-rw-r--r-- 1 root root 10594162 Apr 14 09:00 orarootagent_root.l02
-rw-r--r-- 1 root root 10594176 Apr 12 22:33 orarootagent_root.l03
-rw-r--r-- 1 root root 10594278 Apr 11 12:07 orarootagent_root.l04
-rw-r--r-- 1 root root 10594207 Apr 10 01:39 orarootagent_root.l05
-rw-r--r-- 1 root root 10594252 Apr  8 15:11 orarootagent_root.l06
-rw-r--r-- 1 root root 10594219 Apr  7 04:45 orarootagent_root.l07
-rw-r--r-- 1 root root 10594237 Apr  5 18:18 orarootagent_root.l08
-rw-r--r-- 1 root root 10594181 Apr  4 07:50 orarootagent_root.l09
-rw-r--r-- 1 root root 10594197 Apr  2 21:23 orarootagent_root.l10
-rw-r--r-- 1 root root  4864965 Apr 16 11:15 orarootagent_root.log
-rw-r--r-- 1 root root        0 Nov 28  2012 orarootagent_rootOUT.log
-rw-r--r-- 1 root root        6 Feb  8 20:12 orarootagent_root.pid

Status on dm01db02: PASS => ohasd/orarootagent_root Log Ownership is Correct (root root)


total 105496
-rw-r--r-- 1 root root 10594087 Apr 16 05:08 orarootagent_root.l01
-rw-r--r-- 1 root root 10593994 Apr 14 18:20 orarootagent_root.l02
-rw-r--r-- 1 root root 10594035 Apr 13 07:33 orarootagent_root.l03
-rw-r--r-- 1 root root 10594080 Apr 11 20:46 orarootagent_root.l04
-rw-r--r-- 1 root root 10594063 Apr 10 09:59 orarootagent_root.l05
-rw-r--r-- 1 root root 10594129 Apr  8 23:12 orarootagent_root.l06
-rw-r--r-- 1 root root 10594094 Apr  7 12:24 orarootagent_root.l07
-rw-r--r-- 1 root root 10594104 Apr  6 01:38 orarootagent_root.l08
-rw-r--r-- 1 root root 10594122 Apr  4 14:51 orarootagent_root.l09
-rw-r--r-- 1 root root 10594078 Apr  3 04:04 orarootagent_root.l10
-rw-r--r-- 1 root root  1889469 Apr 16 11:20 orarootagent_root.log
-rw-r--r-- 1 root root        0 Nov 28  2012 orarootagent_rootOUT.log
-rw-r--r-- 1 root root        5 Feb  8 20:40 orarootagent_root.pid
Top

Top

Remote listener set to scan name

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 For Oracle Database 11g Release 2, the REMOTE_LISTENER parameter should be set to the SCAN. This allows the instances to register with the SCAN Listeners to provide information on what services are being provided by the instance, the current load, and a recommendation on how many incoming connections should be directed to the
instance.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Remote listener is set to SCAN name


DATA FROM DM01DB01 - MCSDB DATABASE - REMOTE LISTENER SET TO SCAN NAME



remote listener name=dm01-scan scan name= dm01-scan

Status on dm01db02: PASS => Remote listener is set to SCAN name


remote listener name=dm01-scan scan name= dm01-scan
Top

Top

NTP with correct setting

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Cluster stability

Risk:

Cluster instability due to time synchronization issues

Action / Repair:

Make sure machine clocks are synchronized on all nodes to the same NTP source.
Implement NTP (Network Time Protocol) on all nodes.
Prevents evictions and helps to facilitate problem diagnosis.

Also use the -x option (ie. ntpd -x, xntp -x) if available to prevent time from moving backwards in large amounts. This slewing will help reduce time changes into multiple small changes, such that they will not impact the CRS. Enterprise Linux: see /etc/sysconfig/ntpd; Solaris: set "slewalways yes" and "disable pll" in /etc/inet/ntp.conf.
Like:-
# Drop root to id 'ntp:ntp' by default.
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"
# Set to 'yes' to sync hw clock after successful ntpdate
SYNC_HWCLOCK=no
# Additional options for ntpdate
NTPDATE_OPTIONS=""

The Time servers operate in a pyramid structure where the top of the NTP stack is usually an external time source (such as GPS Clock). This then trickles down through the Network Switch stack to the connected server.
This NTP stack acts as the NTP Server and ensuring that all the RAC Nodes are acting as clients to this server in a slewing method will keep time changes to a minute amount.

Changes in global time to account for atomic accuracy's over Earth rotational wobble , will thus be accounted for with minimal effect. This is sometimes referred to as the " Leap Second " " epoch ", (between UTC 12/31/2008 23:59.59 and 01/01/2009 00:00.00 has the one second inserted).

RFC "NTP Slewing for RAC" has been created successfully. CCB ID 462
 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => NTP is running with correct setting


DATA FROM DM01DB01 - MCSDB DATABASE - NTP WITH CORRECT SETTING



ntp       5477     1  0 Feb08 ?        00:05:56 ntpd -u ntp:ntp -p /var/run/ntpd.pid -x

Status on dm01db02: PASS => NTP is running with correct setting


ntp       5462     1  0 Feb08 ?        00:05:50 ntpd -u ntp:ntp -p /var/run/ntpd.pid -x
Top

Top

Non-routable network for interconnect

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Interconnect should be configured on non-routable private LAN. Interconnect IP should not be accessible outside LAN 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Interconnect is configured on non-routable network addresses


DATA FROM DM01DB01 - MCSDB DATABASE - NON-ROUTABLE NETWORK FOR INTERCONNECT



bondib0  192.168.8.0  global  cluster_interconnect

Status on dm01db02: PASS => Interconnect is configured on non-routable network addresses


bondib0  192.168.8.0  global  cluster_interconnect
Top

Top

IDGEN$ sequence cache size

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Sequence contention (SQ enqueue) can occur if SYS.IDGEN1$ sequence is not cached to 1000.  This condition can lead to performance issues in RAC.  1000 is the default starting in version 11.2.0.1.
 
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => SYS.IDGEN1$ sequence cache size >= 1,000


DATA FOR MCSDB FOR IDGEN$ SEQUENCE CACHE SIZE




idgen1$.cache_size = 1000
Top

Top

OSWatcher status on storage servers

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 It is considered a best practice to run OSWatcher proactively on compute nodes as well on Exadata storage server to capture first failure diagnostics.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => OSWatcher is running on all storage servers


DATA FROM DM01CEL01 FOR OSWATCHER STATUS ON STORAGE SERVERS




NOTE: No output would indicate OSWatcher not running

root     15556     1  0 04:02 ?        00:00:04 /bin/ksh ./OSWatcher.sh 15 168 bzip2 3
root     16301 15556  0 11:00 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_cellsrvstat.sh
root     24275 24273  0 11:05 ?        00:00:00 grep -i osw
root     26519 15556  0 04:02 ?        00:00:04 /bin/ksh ./OSWatcherFM.sh 168 3
root     26539 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_vmstat.sh
root     26540 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_mpstat.sh
root     26541 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_netstat.sh
root     26542 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_iostat.sh
root     26543 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_diskstats.sh
root     26548 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_top.sh
root     26559 15556  0 04:02 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh
root     26579 26559  0 04:02 ?        00:00:04 /bin/bash /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh HighFreq

...More
Top

Top

Automatic segment storage management

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Starting with Oracle 9i Auto Segment Space Management (ASSM) can be used by specifying the SEGMENT SPACE MANAGEMENT clause, set to AUTO in the CREATE TABLESPACE statement. Implementing the ASSM feature allows Oracle to use bitmaps to manage the free space within segments. The bitmap describes the status of each data block within a segment, with respect to the amount of space in the block available for inserting rows. The current status of the space available in a data block is reflected in the bitmap allowing for Oracle to manage free space automatically with ASSM. ASSM tablespaces automate freelist management and remove the requirement/ability to specify PCTUSED, FREELISTS, and FREELIST GROUPS storage parameters for individual tables and indexes created in these tablespaces. 
 
Needs attention onMCSDB
Passed on-

Status on MCSDB: WARNING => Some tablespaces are not using Automatic segment storage management.


DATA FOR MCSDB FOR AUTOMATIC SEGMENT STORAGE MANAGEMENT




OLTS_BATTRSTORE
OLTS_SVRMGSTORE
Top

Top

crsd Log File Ownership

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Supportability, diagnostics 

Risk:

If for any reason the ownership of certain clusterware related log files is changed incorrectly it could result in important diagnostics not being available when needed by Support. These logs are rotated periodically to keep them from growing unmanageably large and if the ownership of the files is incorrect when it is time to rotate the logs that operation could fail and while that doesn't effect the operation of the clusterware itself it would effect the logging and therefore problem diagnostics. 


Action / Repair:

It is wise to verify that the ownership of the following files is root:root:

$ls -l $GRID_HOME/log/`hostname`/crsd/*
$ls -l $GRID_HOME/log/`hostname`/ohasd/*
$ls -l $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
$ls -l $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*

If any of those files' ownership is NOT root:root then you should change the ownership of the files individually as follows (as root):

# chown root:root $GRID_HOME/log/`hostname`/crsd/*
# chown root:root $GRID_HOME/log/`hostname`/ohasd/*
# chown root:root $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
# chown root:root $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => crsd Log Ownership is Correct (root root)


DATA FROM DM01DB01 - MCSDB DATABASE - CRSD LOG FILE OWNERSHIP



total 106308
-rw-r--r-- 1 root root 10575713 Apr 15 23:48 crsd.l01
-rw-r--r-- 1 root root 10575446 Apr 14 05:05 crsd.l02
-rw-r--r-- 1 root root 10575401 Apr 12 10:20 crsd.l03
-rw-r--r-- 1 root root 10575472 Apr 10 15:46 crsd.l04
-rw-r--r-- 1 root root 10575587 Apr  8 21:03 crsd.l05
-rw-r--r-- 1 root root 10575186 Apr  7 02:44 crsd.l06
-rw-r--r-- 1 root root 10575315 Apr  5 08:24 crsd.l07
-rw-r--r-- 1 root root 10575544 Apr  3 13:40 crsd.l08
-rw-r--r-- 1 root root 10575475 Apr  1 19:03 crsd.l09
-rw-r--r-- 1 root root 10575395 Mar 31 00:15 crsd.l10
-rw-r--r-- 1 root root  2920707 Apr 16 11:14 crsd.log
-rw-r--r-- 1 root root     4119 Feb  8 20:12 crsdOUT.log

Status on dm01db02: PASS => crsd Log Ownership is Correct (root root)


total 107284
-rw-r--r-- 1 root root 10586297 Apr  7 19:54 crsd.l01
-rw-r--r-- 1 root root 10585922 Mar 15 12:55 crsd.l02
-rw-r--r-- 1 root root 10536518 Feb 20 09:16 crsd.l03
-rw-r--r-- 1 root root 10569543 Feb  7 11:01 crsd.l04
-rw-r--r-- 1 root root 10569623 Feb  4 02:18 crsd.l05
-rw-r--r-- 1 root root 10569637 Jan 31 16:38 crsd.l06
-rw-r--r-- 1 root root 10570177 Jan 28 06:28 crsd.l07
-rw-r--r-- 1 root root 10569960 Jan 24 20:16 crsd.l08
-rw-r--r-- 1 root root 10569814 Jan 21 10:09 crsd.l09
-rw-r--r-- 1 root root 10569840 Jan 17 23:54 crsd.l10
-rw-r--r-- 1 root root  3963099 Apr 16 11:20 crsd.log
-rw-r--r-- 1 root root     4906 Feb  8 20:40 crsdOUT.log
Top

Top

OSWatcher status

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Improved first failure diagnostics if run proactively

Risk:

Degraded ability to provide first failure diagnostics in case of problems

Action / Repair:

Operating System Watcher (OSW) is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid diagnosing performance issues. OSW is designed to run continuously and to write the metrics to ASCII files which are saved to an archive directory. The amount of archived data saved and frequency of collection are based on user parameters set when starting OSW.
 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => OSWatcher is running


DATA FROM DM01DB01 - MCSDB DATABASE - OSWATCHER STATUS



root     13687     1  0 04:05 ?        00:00:02 /bin/ksh ./OSWatcher.sh 15 168 bzip2 3
root     13852 13687  0 04:06 ?        00:00:02 /bin/ksh ./OSWatcherFM.sh 168 3
root     13874 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_vmstat.sh
root     13875 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_mpstat.sh
root     13876 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_netstat.sh
root     13877 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_iostat.sh
root     13878 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_diskstats.sh
root     13879 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_top.sh
root     13888 13687  0 04:06 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh
root     13902 13888  0 04:06 ?        00:00:15 /bin/bash /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh HighFreq

Status on dm01db02: PASS => OSWatcher is running


root     14369     1  0 04:05 ?        00:00:02 /bin/ksh ./OSWatcher.sh 15 168 bzip2 3
root     14498 14369  0 04:05 ?        00:00:02 /bin/ksh ./OSWatcherFM.sh 168 3
root     14530 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_vmstat.sh
root     14531 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_mpstat.sh
root     14532 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_netstat.sh
root     14533 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_iostat.sh
root     14534 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_diskstats.sh
root     14536 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_top.sh
root     14544 14369  0 04:05 ?        00:00:00 /bin/ksh ./oswsub.sh HighFreq /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh
root     14556 14544  0 04:05 ?        00:00:15 /bin/bash /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh HighFreq
Top

Top

No CRS HOME env variable

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Avoid unexpected results running various Oracle utilities

Risk:

Setting this variable can cause problems for various Oracle components, and it is never necessary for CRS programs because they all have wrapper scripts.

Action / Repair:

Unset ORA_CRS_HOME in the execution environment.  If a variable is needed for automation purposes or convenience then use a different variable name (eg., GI_HOME, etc.)
 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ORA_CRS_HOME environment variable is not set


DATA FROM DM01DB01 - MCSDB DATABASE - NO CRS HOME ENV VARIABLE



SUDOCMD=/usr/bin/sudo
HOSTNAME=dm01db01.mcsdb.com
SHELL=/bin/bash
TERM=xterm
HISTSIZE=1000
CRS_HOME=/u01/app/11.2.0.3/grid
ORACLE_UNQNAME=MCSDB
USER=oracle
LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0.3/dbhome_1/jdk/lib:/u01/app/oracle/product/11.2.0.3/dbhome_1/lib:/u01/app/11.2.0.3/grid/lib
LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:
ORACLE_SID=MCSDB1
ORACLE_BASE=/u01/app/oracle
MAIL=/var/spool/mail/oracle
PATH=/u01/app/oracle/product/11.2.0.3/dbhome_1/bin:/u01/app/oracle/product/11.2.0.3/dbhome_1/jdk/bin:/u01/app/oracle/product/11.2.0.3/dbhome_1/bin:/u01/app/oracle/product/11.2.0.3/dbhome_1/jdk/bin:/usr/local/bin:/bin:/usr/bin:.:/u01/app/oracle/product/11.2.0.3/dbhome_1/bin
INPUTRC=/etc/inputrc
PWD=/opt/oracle.SupportTools/exachk
...More

Status on dm01db02: PASS => ORA_CRS_HOME environment variable is not set


SHELL=/bin/bash
CRS_HOME=/u01/app/11.2.0.3/grid
SSH_CLIENT=10.187.5.2 52273 22
USER=oracle
LD_LIBRARY_PATH=:/u01/app/oracle/product/11.2.0.3/dbhome_1/lib:/u01/app/11.2.0.3/grid/lib
LS_COLORS=
ORACLE_SID=MCSDB2
ORACLE_BASE=/u01/app/oracle
PATH=/u01/app/oracle/product/11.2.0.3/dbhome_1/bin:/usr/local/bin:/bin:/usr/bin
MAIL=/var/mail/oracle
PWD=/home/oracle
LANG=en_US.UTF-8
HOME=/home/oracle
SHLVL=2
LOGNAME=oracle
SSH_CONNECTION=10.187.5.2 52273 10.187.5.3 22
LESSOPEN=|/usr/bin/lesspipe.sh %s
ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/dbhome_1
G_BROKEN_FILENAMES=1
_=/bin/env
Top

Top

GC block lost

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Optimal global cache performance 

Risk:

The RDBMS reports global cache lost block statistics ("gc cr block lost" and/or "gc current block lost") which could indicate a negative impact on interconnect performance and global cache processing.

Action / Repair:

The vast majority of escalations attributed to RDBMS global cache lost blocks can be directly related to faulty or misconfigured interconnects. GC lost blocks diagnostics guide serves as a starting point for evaluating common (and sometimes obvious) causes.

 
Links
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => GC blocks lost is not occurring


DATA FOR MCSDB FOR GC BLOCK LOST




No of GC lost block in last 24 hours = 0
Top

Top

NIC Bonding Mode interconnect

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Optimal interconnect performance

Risk:

Even though the mode 3 option is available for the Linux bonding driver, testing has proven that it duplicates all UDP packets and transmits them on every path. This increases CPU overhead for processing data from the interconnect thereby making the interconnect less efficient. Mode 3 is not needed for fault tolerance as other modes are available for that purpose and which do not duplicate the packets.

While there have been some enhancements to more effectively deal with duplicate packets it still is not a good idea to generate a large number of duplicate packets that are just going to be thrown away and which imposes additional overhead on the interconnect and CPUs. If you must use mode 3 bonding then you should at least have the patches for the two bugs mentioned.

A couple of relevant bugs:

Bug 7238620 - ORA-600 [2032]

REDISCOVERY INFORMATION:
If you are using a RAC IPC module over an unreliable protocol,
like ipc_g link targets, and your network is duplicating packets
at a high rate, you may have hit this bug.

WORKAROUND:
Ensure network is not duplicating any packets.

Bug 9081436 - GC CR REQUEST WAIT CAUSING SESSIONS TO WAIT

This bug is a side effect of the fix for BUG 7238620 which allowed an invalid/corrupt packet to make it through to higher layers in Oracle code, instead of it being discarded and re-requested. The fact that the bad packet was not discarded and re-request a new one is a result of the new code where we are attempting to ignore duplicate packets. But that is now fixed, so what should happen now is that Oracle will still discard duplicate packets, but will also still be on the lookout for any bad/corrupt packets, and if any are received they should be thrown away and re-requested, instead of allowing these bad packets through, to potentially corrupt or overwrite other buffers in memory.

Action / Repair:

Use a different NIC bonding mode, usually mode 1 (Active/Passive)



 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => NIC bonding mode is not set to Broadcast(3) for cluster interconnect


DATA FROM DM01DB01 - MCSDB DATABASE - NIC BONDING MODE INTERCONNECT



Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000

Slave Interface: ib0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:00:48:fe:80
Slave queue ID: 0

...More

Status on dm01db02: PASS => NIC bonding mode is not set to Broadcast(3) for cluster interconnect


Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000

Slave Interface: ib0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:00:48:fe:80
Slave queue ID: 0

Slave Interface: ib1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:00:49:fe:80
...More
Top

Top

AUDSES$ sequence cache size

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Application scalability

Risk:

Problems have been reported with AUDSES$, ORA_TQ_BASE$ and IDGEN1$, which are all internal sequences, if the cache size is too small (eg., login storms or very high concurrency) and can manifest as waits in "rowcache" for "dc_sequences" which is a rowcache type for sequences.

Action / Repair:

Increase AUDSES$ and ORA_TQ_BASE$ to 10,000
Increase IDGEN1$ to a value of 1000

See referenced Notes.

 
Links
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => SYS.AUDSES$ sequence cache size >= 10,000


DATA FOR MCSDB FOR AUDSES$ SEQUENCE CACHE SIZE




audses$.cache_size = 10000
Top

Top

SELinux status

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Exadata standards

Risk:

Deviation from standard deployment, potential unforeseen problems.

Action / Repair:

When Exadata Database Machine is initially deployed SELinux is configured in "permissive" mode.
Customers should only deviate from this configuration if directed to do so by Oracle Support.
Consult Oracle Support about correcting the configuration.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => SELinux is not being Enforced.


DATA FROM DM01DB01 - MCSDB DATABASE - SELINUX STATUS



Disabled

Status on dm01db02: PASS => SELinux is not being Enforced.


Disabled
Top

Top

crsd/orarootagent_root Log File Ownership

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Supportability, diagnostics

Risk:

If for any reason the ownership of certain clusterware related log files is changed incorrectly it could result in important diagnostics not being available when needed by Support. These logs are rotated periodically to keep them from growing unmanageably large and if the ownership of the files is incorrect when it is time to rotate the logs that operation could fail and while that doesn't effect the operation of the clusterware itself it would effect the logging and therefore problem diagnostics.


Action / Repair:

It is wise to verify that the ownership of the following files is root:root:

$ls -l $GRID_HOME/log/`hostname`/crsd/*
$ls -l $GRID_HOME/log/`hostname`/ohasd/*
$ls -l $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
$ls -l $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*

If any of those files' ownership is NOT root:root then you should change the ownership of the files individually as follows (as root):

# chown root:root $GRID_HOME/log/`hostname`/crsd/*
# chown root:root $GRID_HOME/log/`hostname`/ohasd/*
# chown root:root $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
# chown root:root $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => crsd/orarootagent_root Log Ownership is Correct (root root)


DATA FROM DM01DB01 - MCSDB DATABASE - CRSD/ORAROOTAGENT_ROOT LOG FILE OWNERSHIP



total 109248
-rw-r--r-- 1 root root 10572528 Mar 28 10:54 orarootagent_root.l01
-rw-r--r-- 1 root root 10525003 Feb 23 12:34 orarootagent_root.l02
-rw-r--r-- 1 root root 10572171 Jan 22 17:24 orarootagent_root.l03
-rw-r--r-- 1 root root 10572279 Dec 22 01:23 orarootagent_root.l04
-rw-r--r-- 1 root root 10572246 Nov 19 13:03 orarootagent_root.l05
-rw-r--r-- 1 root root 10572174 Oct 18 08:49 orarootagent_root.l06
-rw-r--r-- 1 root root 10572955 Sep 16  2013 orarootagent_root.l07
-rw-r--r-- 1 root root 10491452 Aug 14  2013 orarootagent_root.l08
-rw-r--r-- 1 root root 10560602 Jul 12  2013 orarootagent_root.l09
-rw-r--r-- 1 root root 10517359 Jun 10  2013 orarootagent_root.l10
-rw-r--r-- 1 root root  6130017 Apr 16 11:14 orarootagent_root.log
-rw-r--r-- 1 root root        0 Nov 28  2012 orarootagent_rootOUT.log
-rw-r--r-- 1 root root        6 Feb  8 20:13 orarootagent_root.pid

Status on dm01db02: PASS => crsd/orarootagent_root Log Ownership is Correct (root root)


total 107120
-rw-r--r-- 1 root root 10572802 Apr  4 09:02 orarootagent_root.l01
-rw-r--r-- 1 root root 10545310 Mar  3 04:09 orarootagent_root.l02
-rw-r--r-- 1 root root 10572300 Jan 29 14:40 orarootagent_root.l03
-rw-r--r-- 1 root root 10572408 Dec 27 10:20 orarootagent_root.l04
-rw-r--r-- 1 root root 10572378 Nov 24 06:58 orarootagent_root.l05
-rw-r--r-- 1 root root 10572372 Oct 21 23:50 orarootagent_root.l06
-rw-r--r-- 1 root root 10572950 Sep 18  2013 orarootagent_root.l07
-rw-r--r-- 1 root root 10496762 Aug 16  2013 orarootagent_root.l08
-rw-r--r-- 1 root root 10572712 Jul 14  2013 orarootagent_root.l09
-rw-r--r-- 1 root root 10521468 Jun 11  2013 orarootagent_root.l10
-rw-r--r-- 1 root root  3916231 Apr 16 11:19 orarootagent_root.log
-rw-r--r-- 1 root root        0 Nov 28  2012 orarootagent_rootOUT.log
-rw-r--r-- 1 root root        5 Feb  8 20:40 orarootagent_root.pid
Top

Top

oradism executable ownership

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Changes scheduling priority of LMS and other background processes to realtime scheduling class to maximize the ability of key processes to be able to be scheduled on the CPU at times of high CPU utilization

Risk:

The oradism executable should be owned by root and the owner s-bit should be set, eg. -rwsr-x---, where the s is the setuid bit (s-bit) for root in this case.  If the LMS process is not running at the proper scheduling priority it can lead to instance evictions due to IPC send timeouts or ORA-29740 errors.  oradism must be owned by root and it's s-bit set in order to be able to change the scheduling priority.   If oradism ownership is not root and the owner s-bit is not set then something must have gone wrong in the installation process or the ownership or the permission was otherwise changed.  

Action / Repair:

Please check with Oracle Support to determine the best course to take for your platform to correct the problem.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => $ORACLE_HOME/bin/oradism ownership is root


DATA FROM DM01DB01 - MCSDB DATABASE - ORADISM EXECUTABLE OWNERSHIP



-rwsr-x--- 1 root oinstall 71758 Sep 17  2011 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oradism

Status on dm01db02: PASS => $ORACLE_HOME/bin/oradism ownership is root


-rwsr-x--- 1 root oinstall 71758 Nov 28  2012 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oradism
Top

Top

Interconnect NIC bonding config.

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Interconnect network interface redundancy, increased node availability

Risk:

Lack of NIC bonding for the interconnect network interface can result in a single point of failure for cluster nodes thereby causing the potential for database instance and cluster node availaibility issues.

Action / Repair:

Oracle highly recommends to configure redundant network for the interconnect using NIC BONDING.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => NIC bonding is configured for interconnect


DATA FROM DM01DB01 - MCSDB DATABASE - INTERCONNECT NIC BONDING CONFIG.



bondib0  192.168.8.0  global  cluster_interconnect

Status on dm01db02: PASS => NIC bonding is configured for interconnect


bondib0  192.168.8.0  global  cluster_interconnect
Top

Top

oradism executable permission

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The oradism executable is invoked after database startup to change the scheduling priority of LMS and other database background processes to the realtime scheduling class in order to maximize the ability of these key processes to be scheduled on the CPU in a timely way at times of high CPU utilization.

Risk:

The oradism executable should be owned by root and the owner s-bit should be set, eg. -rwsr-x---, where the s is the setuid bit (s-bit) for root in this case.  If the LMS process is not running at the proper scheduling priority it can lead to instance evictions due to IPC send timeouts or ORA-29740 errors.  oradism must be owned by root and it's s-bit set in order to be able to change the scheduling priority.   If oradism ownership is not root and the owner s-bit is not set then something must have gone wrong in the installation process or the ownership or the permission was otherwise changed.  

Action / Repair:

Please check with Oracle Support to determine the best course to take for your platform to correct the problem.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => $ORACLE_HOME/bin/oradism setuid bit is set


DATA FROM DM01DB01 - MCSDB DATABASE - ORADISM EXECUTABLE PERMISSION



-rwsr-x--- 1 root oinstall 71758 Sep 17  2011 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oradism

Status on dm01db02: PASS => $ORACLE_HOME/bin/oradism setuid bit is set


-rwsr-x--- 1 root oinstall 71758 Nov 28  2012 /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/oradism
Top

Top

ohasd Log File Ownership

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Supportability, diagnostics

Risk:

If for any reason the ownership of certain clusterware related log files is changed incorrectly it could result in important diagnostics not being available when needed by Support. These logs are rotated periodically to keep them from growing unmanageably large and if the ownership of the files is incorrect when it is time to rotate the logs that operation could fail and while that doesn't effect the operation of the clusterware itself it would effect the logging and therefore problem diagnostics.


Action / Repair:

It is wise to verify that the ownership of the following files is root:root:

$ls -l $GRID_HOME/log/`hostname`/crsd/*
$ls -l $GRID_HOME/log/`hostname`/ohasd/*
$ls -l $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
$ls -l $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*

If any of those files' ownership is NOT root:root then you should change the ownership of the files individually as follows (as root):

# chown root:root $GRID_HOME/log/`hostname`/crsd/*
# chown root:root $GRID_HOME/log/`hostname`/ohasd/*
# chown root:root $GRID_HOME/log/`hostname`/agent/crsd/orarootagent_root/*
# chown root:root $GRID_HOME/log/`hostname`/agent/ohasd/orarootagent_root/*
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ohasd Log Ownership is Correct (root root)


DATA FROM DM01DB01 - MCSDB DATABASE - OHASD LOG FILE OWNERSHIP



total 108132
-rw-r--r-- 1 root root 10577866 Apr 14 17:25 ohasd.l01
-rw-r--r-- 1 root root 10577707 Apr 10 19:59 ohasd.l02
-rw-r--r-- 1 root root 10577670 Apr  6 22:37 ohasd.l03
-rw-r--r-- 1 root root 10577866 Apr  3 01:15 ohasd.l04
-rw-r--r-- 1 root root 10578394 Mar 30 03:48 ohasd.l05
-rw-r--r-- 1 root root 10577876 Mar 26 05:57 ohasd.l06
-rw-r--r-- 1 root root 10577956 Mar 22 08:32 ohasd.l07
-rw-r--r-- 1 root root 10578163 Mar 18 11:03 ohasd.l08
-rw-r--r-- 1 root root 10578016 Mar 14 13:32 ohasd.l09
-rw-r--r-- 1 root root 10578027 Mar 10 15:54 ohasd.l10
-rw-r--r-- 1 root root  4744544 Apr 16 11:13 ohasd.log
-rw-r--r-- 1 root root     3720 Feb  8 20:12 ohasdOUT.log

Status on dm01db02: PASS => ohasd Log Ownership is Correct (root root)


total 107524
-rw-r--r-- 1 root root 10578869 Apr 14 12:19 ohasd.l01
-rw-r--r-- 1 root root 10578936 Apr  9 11:33 ohasd.l02
-rw-r--r-- 1 root root 10578581 Apr  4 10:32 ohasd.l03
-rw-r--r-- 1 root root 10578607 Mar 30 10:01 ohasd.l04
-rw-r--r-- 1 root root 10578885 Mar 25 09:35 ohasd.l05
-rw-r--r-- 1 root root 10578796 Mar 20 08:54 ohasd.l06
-rw-r--r-- 1 root root 10579078 Mar 15 08:11 ohasd.l07
-rw-r--r-- 1 root root 10579057 Mar 10 06:57 ohasd.l08
-rw-r--r-- 1 root root 10578728 Mar  5 05:59 ohasd.l09
-rw-r--r-- 1 root root 10579056 Feb 28 05:11 ohasd.l10
-rw-r--r-- 1 root root  4132524 Apr 16 11:19 ohasd.log
-rw-r--r-- 1 root root     2232 Feb  8 20:39 ohasdOUT.log
Top

Top

VIP NIC bonding config.

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Public network interface (VIP) redundancy, increased application availability

Risk:

Lack of NIC bonding for the public network interface can result in a single point of failure for VIPs thereby causing the potential for application availaibility issues.

Action / Repair:

Oracle highly recommends to configure redundant network for VIPs using NIC BONDING.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => NIC bonding is configured for public network (VIP)


DATA FROM DM01DB01 - MCSDB DATABASE - VIP NIC BONDING CONFIG.



bondeth0  Link encap:Ethernet  HWaddr 00:21:28:E7:98:DF
inet addr:10.187.4.90  Bcast:10.187.4.255  Mask:255.255.255.0
inet6 addr: fe80::221:28ff:fee7:98df/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
RX packets:8257391698 errors:0 dropped:0 overruns:0 frame:0
TX packets:9581043446 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2074115133099 (1.8 TiB)  TX bytes:5842294339942 (5.3 TiB)


Status on dm01db02: PASS => NIC bonding is configured for public network (VIP)


bondeth0  Link encap:Ethernet  HWaddr 00:21:28:E7:C0:A5
inet addr:10.187.4.91  Bcast:10.187.4.255  Mask:255.255.255.0
inet6 addr: fe80::221:28ff:fee7:c0a5/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
RX packets:7443677732 errors:0 dropped:0 overruns:0 frame:0
TX packets:8714433867 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1911704716665 (1.7 TiB)  TX bytes:5486788385307 (4.9 TiB)

Top

Top

User Open File Limit

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Please consult 

Oracle Database Installation Guide for Linux
Configure Oracle Installation Owner Shell Limits
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Open files limit (ulimit -n) for current user is set to recommended value >= 65536 or unlimited


DATA FROM DM01DB01 - MCSDB DATABASE - USER OPEN FILE LIMIT



core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 773848
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 131072
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Status on dm01db02: PASS => Open files limit (ulimit -n) for current user is set to recommended value >= 65536 or unlimited


core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 773848
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 131072
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Top

Top

NIC Bonding Mode Public

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => NIC bonding mode is not set to Broadcast(3) for public network


DATA FROM DM01DB01 - MCSDB DATABASE - NIC BONDING MODE PUBLIC




NOTE: Look for Bonding Mode:

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
...More

Status on dm01db02: PASS => NIC bonding mode is not set to Broadcast(3) for public network



NOTE: Look for Bonding Mode:

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:28:e7:c0:a5
Slave queue ID: 0

Slave Interface: eth2
...More
Top

Top

GI shell limits hard stack

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The hard stack shell limit for the Oracle Grid Infrastructure software install owner as defined in /etc/security/limits.conf should be >= 10240.

What's being checked here is the /etc/security/limits.conf file as documented in 11gR2 Grid Infrastructure Installation Guide, section 2.15.3 Setting Resource Limits for the Oracle Software Installation Users.  

If the /etc/security/limits.conf file is not configured as described in the documentation then to check the hard stack configuration while logged into the software owner account (eg. grid).

$ ulimit -Hs
10240

As long as the hard stack limit is 10240 or above then the configuration should be ok.

 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit hard stack for GI is configured according to recommendation


DATA FROM DM01DB01 FOR CRS USER LIMITS CONFIGURATION



Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            1572864              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             773848               773848               processes
Max open files            65536                65536                files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       773848               773848               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
...More

Status on dm01db02: PASS => Shell limit hard stack for GI is configured according to recommendation


DATA FROM DM01DB02 FOR CRS USER LIMITS CONFIGURATION



Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            1572864              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             773848               773848               processes
Max open files            65536                65536                files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       773848               773848               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
...More
Top

Top

DB shell limits soft nproc

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The soft nproc shell limit for the Oracle DB software install owner as defined in /etc/security/limits.conf should be >= 2047.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit soft nproc for DB is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - DB SHELL LIMITS SOFT NPROC



oracle soft nproc 131072

Status on dm01db02: PASS => Shell limit soft nproc for DB is configured according to recommendation


oracle soft nproc 131072
Top

Top

DB shell limits hard stack

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Documented value is the /etc/security/limits.conf file as documented in 11gR2 Grid Infrastructure  Installation Guide, section 2.15.3 Setting Resource Limits for the Oracle Software Installation Users.  

If the /etc/security/limits.conf file is not configured as described in the documentation then log in to the system as the database software owner (eg., oracle) and check the hard stack configuration as described below.

Risk:

The hard stack shell limit for the Oracle DB software install owner as defined in /etc/security/limits.conf should be >= 10240.  As long as the hard stack limit is 10240 or above then the configuration should be ok.

Action / Repair:

Change DB software install owner hard stack shell limit

$ ulimit -Hs
10240


 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit hard stack for DB is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - DB SHELL LIMITS HARD STACK



oracle hard stack unlimited

Status on dm01db02: PASS => Shell limit hard stack for DB is configured according to recommendation


oracle hard stack unlimited
Top

Top

DB shell limits hard nofile

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Documented value, cluster stability

The hard nofile shell limit for the Oracle DB software install owner as defined in /etc/security/limits.conf should be >= 65536.

Risk:

Resource starvation (file descriptors) leading to node instability

Action / Repair:

Change DB software install owner hard nofile shell limit
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit hard nofile for DB is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - DB SHELL LIMITS HARD NOFILE



oracle hard nofile 65536

Status on dm01db02: PASS => Shell limit hard nofile for DB is configured according to recommendation


oracle hard nofile 65536
Top

Top

DB shell limits soft nofile

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Documented value, cluster stability

The soft nofile shell limit for the Oracle DB software install owner as defined in /etc/security/limits.conf should be >= 1024.

Risk:

Resource starvation leading to node instability

Action / Repair:

Change DB software install owner soft nofile shell limit
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit soft nofile for DB is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - DB SHELL LIMITS SOFT NOFILE



oracle soft nofile 65536

Status on dm01db02: PASS => Shell limit soft nofile for DB is configured according to recommendation


oracle soft nofile 65536
Top

Top

DB shell limits hard nproc

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Documented value, cluster stability

The hard nproc shell limit for the Oracle DB software install owner as defined in /etc/security/limits.conf should be >= 16384.

Risk:

Resource starvation (processes) leading to node instability

Action / Repair:

Change DB software install owner hard nproc shell limit

 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit hard nproc for DB is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - DB SHELL LIMITS HARD NPROC



oracle hard nproc 131072

Status on dm01db02: PASS => Shell limit hard nproc for DB is configured according to recommendation


oracle hard nproc 131072
Top

Top

GI shell limits hard nproc

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The hard nproc shell limit for the Oracle GI software install owner as defined in /etc/security/limits.conf should be >= 16384.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit hard nproc for GI is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - GI SHELL LIMITS HARD NPROC






oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
...More

Status on dm01db02: PASS => Shell limit hard nproc for GI is configured according to recommendation





oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
grid     soft    nofile       65536
grid     hard    nofile       65536
grid     soft    memlock      unlimited
grid     hard    memlock      unlimited
...More
Top

Top

GI shell limits hard nofile

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The hard nofile shell limit for the Oracle GI software install owner as defined in /etc/security/limits.conf should be >= 65536.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit hard nofile for GI is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - GI SHELL LIMITS HARD NOFILE






oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
...More

Status on dm01db02: PASS => Shell limit hard nofile for GI is configured according to recommendation





oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
grid     soft    nofile       65536
grid     hard    nofile       65536
grid     soft    memlock      unlimited
grid     hard    memlock      unlimited
...More
Top

Top

GI shell limits soft nproc

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The soft nproc shell limit for the Oracle GI software install owner as defined in /etc/security/limits.conf should be >= 2047.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit soft nproc for GI is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - GI SHELL LIMITS SOFT NPROC






oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
...More

Status on dm01db02: PASS => Shell limit soft nproc for GI is configured according to recommendation





oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
grid     soft    nofile       65536
grid     hard    nofile       65536
grid     soft    memlock      unlimited
grid     hard    memlock      unlimited
...More
Top

Top

GI shell limits soft nofile

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The soft nofile shell limit for the Oracle GI software install owner as defined in /etc/security/limits.conf should be >= 1024.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Shell limit soft nofile for GI is configured according to recommendation


DATA FROM DM01DB01 - MCSDB DATABASE - GI SHELL LIMITS SOFT NOFILE






oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
...More

Status on dm01db02: PASS => Shell limit soft nofile for GI is configured according to recommendation





oracle     soft    core       unlimited
oracle     hard    core       unlimited
oracle     soft    nproc       131072
oracle     hard    nproc       131072
oracle     soft    nofile       65536
oracle     hard    nofile       65536
oracle     soft    memlock      unlimited
oracle     hard    memlock      unlimited

grid     soft    core       unlimited
grid     hard    core       unlimited
grid     soft    nproc       131072
grid     hard    nproc       131072
grid     soft    nofile       65536
grid     hard    nofile       65536
grid     soft    memlock      unlimited
grid     hard    memlock      unlimited
...More
Top

Top

Verify Data Network is Separate from Management Network

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical, 03/02/11
Benefit / Impact:
It is a requirement that the management network be on a different non-overlapping sub-net than the InfiniBand network and the client access network. This is necessary for better network security, better client access bandwidths, and for Auto Service Request (ASR) to work correctly.
The management network comprises of the eth0 network interface in the database and storage severs, the ILOM network interfaces of the database and storage servers, and the Ethernet management interfaces of the InfiniBand switches and PDUs.
Risk:
Having the management network on the same subnet as the client access network will reduce network security, potentially restrict the client access bandwidth to/from the Database Machine to a single 1GbE link, and will prevent ASR from working correctly.
Action / Repair:
To verify that the management network interface (eth0) is on a separate network from other network interfaces, execute the following command as the "root" userid on both storage and database servers:
grep -i network /etc/sysconfig/network-scripts/ifcfg* | cut -f5 -d"/" | grep -v "#"
 
The output will be similar to:
ifcfg-bondeth0:NETWORK=10.204.77.0
ifcfg-bondib0:NETWORK=192.168.76.0
ifcfg-eth0:NETWORK=10.204.78.0
ifcfg-lo:NETWORK=127.0.0.0
The expected result is that the network values are different. If they are not, investigate and correct the condition.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Management network is separate from data network


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY DATA NETWORK IS SEPARATE FROM MANAGEMENT NETWORK



ifcfg-bondeth0:NETWORK=10.187.4.0
ifcfg-bondib0:NETWORK=192.168.8.0
ifcfg-eth0:NETWORK=10.187.5.0

Status on dm01db02: PASS => Management network is separate from data network


ifcfg-bondeth0:NETWORK=10.187.4.0
ifcfg-bondib0:NETWORK=192.168.8.0
ifcfg-eth0:NETWORK=10.187.5.0
Top

Top

Verify RAID Controller Battery Temperature [Storage Server]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Maintaining proper temperature ranges maximizes RAID controller battery life.
The impact of verifying RAID controller battery temperature is minimal.

Risk:

A reported temperature of 60C or higher causes the battery to suspend charging until the temperature drops and shortens the service life of the battery, causing it to fail prematurely and put the RAID controller into WriteThrough mode which significantly impacts write I/O performance.

Action / Repair:

To verify the RAID controller battery temperature, execute the following command on all servers:
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep BatteryType; /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep -i temper
The output will be similar to:
BatteryType: iBBU08
Temperature: 38 C
  Temperature                  : OK
  Over Temperature        : No
 
If the battery temperature is equal to or greater than 55C, investigate and correct the environmental conditions.
NOTE: Replace Battery Module after 3 Year service life assuming the battery temperature has not exceeded 55C. If the temperature has exceeded 55C (battery temp shall not exceed 60C), replace the battery every 2 years.
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => RAID controller battery temperature is normal [Storage Server]


DATA FROM DM01CEL01 FOR VERIFY RAID CONTROLLER BATTERY TEMPERATURE [STORAGE SERVER]



Temperature: 39 C
Temperature                             : OK
Over Temperature        : No




DATA FROM DM01CEL02 FOR VERIFY RAID CONTROLLER BATTERY TEMPERATURE [STORAGE SERVER]



Temperature: 38 C
Temperature                             : OK
Over Temperature        : No


...More
Top

Top

Verify RAID Controller Battery Temperature [Database Server]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Maintaining proper temperature ranges maximizes RAID controller battery life.
The impact of verifying RAID controller battery temperature is minimal.

Risk:

A reported temperature of 60C or higher causes the battery to suspend charging until the temperature drops and shortens the service life of the battery, causing it to fail prematurely and put the RAID controller into WriteThrough mode which significantly impacts write I/O performance.

Action / Repair:

To verify the RAID controller battery temperature, execute the following command on all servers:
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep BatteryType; /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep -i temper
The output will be similar to:
BatteryType: iBBU08
Temperature: 38 C
  Temperature                  : OK
  Over Temperature        : No
 
If the battery temperature is equal to or greater than 55C, investigate and correct the environmental conditions.
NOTE: Replace Battery Module after 3 Year service life assuming the battery temperature has not exceeded 55C. If the temperature has exceeded 55C (battery temp shall not exceed 60C), replace the battery every 2 years.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => RAID controller battery temperature is normal [Database Server]


DATA FROM DM01DB01 FOR VERIFY RAID CONTROLLER BATTERY TEMPERATURE [DATABASE SERVER]



Temperature: 41 C
Temperature                             : OK
Over Temperature        : No

Status on dm01db02: PASS => RAID controller battery temperature is normal [Database Server]


DATA FROM DM01DB02 FOR VERIFY RAID CONTROLLER BATTERY TEMPERATURE [DATABASE SERVER]



Temperature: 38 C
Temperature                             : OK
Over Temperature        : No
Top

Top

Manage ASM Audit File Directory Growth with cron

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The audit file destination directories for an ASM instance can grow to contain a very large number of files if they are not regularly maintained. Use the Linux cron(8) utility and the find(1) command to manage the number of files in the audit file destination directories.

The impact of using cron(8) and find(1) to manage the number of files in the audit file destination directories is minimal.

Risk:

Having a very large number of files can cause the file system to run out of free disk space or inodes, or can cause Oracle to run very slowly due to file system directory scaling limits, which can have the appearance that the ASM instance is hanging on startup.

Action / Repair:

Refer to MOS Note 1298957.1. 
 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ASM Audit file destination file count <= 100,000


DATA FROM DM01DB01 - MCSDB DATABASE - MANAGE ASM AUDIT FILE DIRECTORY GROWTH WITH CRON



Number of audit files at /u01/app/11.2.0.3/grid/rdbms/audit = 139

Status on dm01db02: PASS => ASM Audit file destination file count <= 100,000


Number of audit files at /u01/app/11.2.0.3/grid/rdbms/audit = 38
Top

Top

Verify Database Server Virtual Drive Configuration

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the virtual drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.

Risk:

Not verifying the virtual drives increases the chance of a performance degradation or an outage.

Action / Repair:

To verify the database server virtual drive configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Virtual Drive:";/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Number Of Drives";/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "^State" 
For X2-2 the output should be similar to:
Virtual Drive: 0 (Target Id: 0)
Number Of Drives    : 3
State               : Optimal
The expected result is that the virtual device has 3 drives and a state of optimal.
NOTE: The virtual device number reported may vary depending upon configuration and version levels.
NOTE: If a bare metal restore procedure is performed on a database server without using the "dualboot=no" configuration, that database server may be left with three virtual devices for X2-2 and 7 for X2-8. Please see My Oracle Support note 1323309.1 for additional information and correction instructions.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database Server Virtual Drive Configuration meets recommendation


DATA FROM DM01DB01 FOR VERIFY DATABASE SERVER VIRTUAL DRIVE CONFIGURATION



Virtual Drive: 1 (Target Id: 1)
State               : Optimal
Number Of Drives    : 3

Status on dm01db02: PASS => Database Server Virtual Drive Configuration meets recommendation


DATA FROM DM01DB02 FOR VERIFY DATABASE SERVER VIRTUAL DRIVE CONFIGURATION



Virtual Drive: 1 (Target Id: 1)
State               : Optimal
Number Of Drives    : 3
Top

Top

DBM FailGroups

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 There should be one failgroup per diskgroup for each Database Machine cell.  So for instance, assuming 3 cells and two diskgroups, each diskgroup should have 3 failgroups, one on each cell.

SQL> select distinct group_number,failgroup from v$asm_disk;

GROUP_NUMBER FAILGROUP
------------ ------------------------------
           1 DSCBAS01S
           1 DSCBAS02S
           1 DSCBAS03S
           2 DSCBAS01S
           2 DSCBAS02S
           2 DSCBAS03S

6 rows selected.
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => Correct number of FailGroups per ASM DiskGroup are configured


DATA FROM DM01DB01 - MCSDB DATABASE - DBM FAILGROUPS




Disk group DATA_DM01 has 3 failgroups
Disk group DBFS_DG has 3 failgroups
Disk group RECO_DM01 has 3 failgroups

Top

Top

Exadata storage server system model number

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => All Exadata storage server meet system model number requirement


DATA FROM DM01CEL01 FOR EXADATA STORAGE SERVER SYSTEM MODEL NUMBER




NOTE: Look for system_descritpion =

Connected. Use ^D to exit.
-> show /SP system_description

/SP
Properties:
system_description = SUN FIRE X4270 M2 SERVER, ILOM v3.0.16.10.d, r74499


-> Session closed
Disconnected



...More
Top

Top

Database server system model number

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => System model number is correct


DATA FROM DM01DB01 FOR EXADATA SYSTEM MODEL NUMBER




NOTE: Look for system_description =

Connected. Use ^D to exit.
-> show /SP system_description

/SP
Properties:
system_description = SUN FIRE X4170 M2 SERVER, ILOM v3.0.16.10.d, r74499


-> Session closed
Disconnected

Status on dm01db02: PASS => System model number is correct


DATA FROM DM01DB02 FOR EXADATA SYSTEM MODEL NUMBER




NOTE: Look for system_description =

Connected. Use ^D to exit.
-> show /SP system_description

/SP
Properties:
system_description = SUN FIRE X4170 M2 SERVER, ILOM v3.0.16.10.d, r74499


-> Session closed
Disconnected
Top

Top

Number of Mounts before a File System check

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 A filesystem will be checked (fsck) after a specified number of times it is mounted, typically at reboot time. This maximum number of mounts before a check can be determined via the tune2fs -l  command (look for Maximum mount count) and is -1 by default on the database servers of the database machine, meaning no fsck will be run. If you would like to change this maximum, it can be done with the tune2fs -c  command.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Number of Mounts before a File System check is set to -1 for system disk


DATA FROM DM01DB01 FOR NUMBER OF MOUNTS BEFORE A FILE SYSTEM CHECK




NOTE: Look for Maximum mount count

tune2fs 1.39 (29-May-2006)
Filesystem volume name:   BOOT
Last mounted on:          <not available>
Filesystem UUID:          d8d92dff-d116-403a-8476-9cb0cce12bd4
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery sparse_super
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              32128
Block count:              128488
...More

Status on dm01db02: PASS => Number of Mounts before a File System check is set to -1 for system disk


DATA FROM DM01DB02 FOR NUMBER OF MOUNTS BEFORE A FILE SYSTEM CHECK




NOTE: Look for Maximum mount count

tune2fs 1.39 (29-May-2006)
Filesystem volume name:   BOOT
Last mounted on:          <not available>
Filesystem UUID:          6ab16708-dba7-43ae-8629-e05ce8cfce1a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery sparse_super
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              32128
Block count:              128488
...More
Top

Top

Free space in root file system

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Best Practice, proactive problem avoidance

Risk:

root filesystem has less than 10% free space remaining. action should be taken to free up space in order to avoid potential problems from lack of space in the root filesystem.

Action / Repair:

Free up space in the root filesystem
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Free space in root(/) filesystem meets or exceeds recommendation.


DATA FROM DM01DB01 - MCSDB DATABASE - FREE SPACE IN ROOT FILE SYSTEM



Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G   17G   12G  59% /

Status on dm01db02: PASS => Free space in root(/) filesystem meets or exceeds recommendation.


Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G  9.5G   19G  34% /
Top

Top

Exadata storage server root filesystem free space

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 root filesystem has less than 10% free space remaining on one ore more storage servers. action should be taken to free up space.
 
Needs attention ondm01cel03, dm01cel02, dm01cel01
Passed on-

Status on dm01cel03, dm01cel02, dm01cel01: WARNING => Free space in root(/) filesystem is less than recommended on one or more storage servers.


DATA FROM DM01CEL01 FOR EXADATA STORAGE SERVER ROOT FILESYSTEM FREE SPACE



Filesystem            Size  Used Avail Use% Mounted on
/dev/md6              9.9G  8.8G  593M  94% /




DATA FROM DM01CEL02 FOR EXADATA STORAGE SERVER ROOT FILESYSTEM FREE SPACE



Filesystem            Size  Used Avail Use% Mounted on
/dev/md6              9.9G  8.5G  884M  91% /




...More
Top

Top

Verify Oracle RAC Databases use RDS Protocol over InfiniBand Network.

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:
The RDS protocol over InfiniBand provides superior performance because it avoids additional memory buffering operations when moving data from process memory to the network interface for IO operations. This includes both IO operations between the Oracle instance and the storage servers, as well as instance to instance block transfers via Cache Fusion.
There is minimal impact to verify that the RDS protocol is in use. Implementing the RDS protocol over InfiniBand requires an outage to relink the Oracle software.
Risk:
If the Oracle RAC databases do not use RDS protocol over the InfiniBand network, IO operations will be sub-optimal.
Action / Repair:
To verify the RDS protocol is in use by a given Oracle instance, set the ORACLE_HOME and LD_LIBRARY_PATH variables properly for the instance and execute the following command as the oracle userid on each database server where the instance is running:
$ORACLE_HOME/bin/skgxpinfo
The output should be:
rds
Note: For Oracle software versions below 11.2.0.2, the skgxpinfo command is not present. For 11.2.0.1, you can copy over skgxpinfo to the proper path in your 11.2.0.1 environment from an available 11.2.0.2 environment and execute it against the 11.2.0.1 database home(s) using the provided command.
Note: An alternative check (regardless of Oracle software version) is to scan each instance's alert log (must contain a startup sequence!) for the following line:
Cluster communication is configured to use the following interface(s) for this instance 192.168.20.21 cluster interconnect IPC version:Oracle RDS/IP (generic)
If the instance is not using the RDS protocol over InfiniBand, relink the Oracle binary using the following commands (with variables properly defined for each home being linked):
(as oracle) Shutdown any processes using the Oracle binary
If and only if relinking the grid infrastructure home, then (as root) GRID_HOME/crs/install/rootcrs.pl -unlock
(as oracle) cd $ORACLE_HOME/rdbms/lib
(as oracle) make -f ins_rdbms.mk ipc_rds ioracle
If and only if relinking the Grid Infrastructure home, then (as root) GRID_HOME/crs/install/rootcrs.pl -patch
Note: Avoid using the relink all command due to various issues. Use the make commands provided.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Oracle RAC Communication is using RDS protocol on Infiniband Network


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY ORACLE RAC DATABASES USE RDS PROTOCOL OVER INFINIBAND NETWORK.



rds

Status on dm01db02: PASS => Oracle RAC Communication is using RDS protocol on Infiniband Network


rds
Top

Top

Verify Oracle RAC Databases use RDS Protocol over InfiniBand Network. [Database Home]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

RDS protocol provides superior performance for Oracle RAC communication because it does not require additional memory buffering operations when moving data from process memory to the network interface for RAC inter-node communication.

There is minimal impact to verify that RDS protocol is in use. If it is not, implementing it will require a relink of the Oracle software, which requires an outage.

Risk:

If the Oracle RAC databases do not use RDS protocol over the InfiniBand network, RAC inter-node communication will be sub-optimal.

Action / Repair:

Perform the following steps on the database servers:

   1. Verify an Oracle RAC database is using the RDS protocol over the InfiniBand Network by checking all alert logs on all nodes: 

Cluster communication is configured to use the following interface(s) for this instance 192.168.20.21 cluster interconnect IPC version:Oracle RDS/IP (generic) 

   1. If it is not running RDS, relink the Oracle binary via the following:

    * (as oracle) Shutdown any process using the Oracle binary
    * (as root) GRID_HOME/crs/install/rootcrs.pl -unlock **only required if relinking in the Grid Infrastructure home**
    * (as oracle) cd $ORACLE_HOME/rdbms/lib
    * (as oracle) make -f ins_rdbms.mk ipc_rds ioracle
    * (as root) GRID_HOME/crs/install/rootcrs.pl -patch only required if relinking in the Grid Infrastructure home 

    Note: Avoid using the relink all command due to various issues. Use the make commands as seen above when relinking oracle binaries.

    Note: The dcli utility can be used to double check that all nodes are configured consistently with respect to the skgxp libraries in use. An example command to perform this check is

    dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l oracle md5sum ${ORACLE_HOME}/lib/libskgxp11.so 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database Home is properly linked with RDS library


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY ORACLE RAC DATABASES USE RDS PROTOCOL OVER INFINIBAND NETWORK. [DATABASE HOME]




NOTE: md5sum of RDS protocol libraries should match for Database to properly use RDS protocol for Infiniband

352f95d3be1aed7796f11e7cfa45699d  /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
352f95d3be1aed7796f11e7cfa45699d  /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxpr.so

Status on dm01db02: PASS => Database Home is properly linked with RDS library



NOTE: md5sum of RDS protocol libraries should match for Database to properly use RDS protocol for Infiniband

352f95d3be1aed7796f11e7cfa45699d  /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
352f95d3be1aed7796f11e7cfa45699d  /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxpr.so
Top

Top

Verify InfiniBand is the Private Network for Oracle Clusterware Communication

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The InfiniBand network in an Oracle Exadata Database Machine provides superior performance and throughput characteristics that allow Oracle Clusterware to operate at optimal efficiency.
The overhead for these verification steps is minimal.

Risk:

If the InfiniBand network is not used for Oracle Clusterware communication, performance will be sub-optimal.

Action / Repair:

The InfiniBand network is preconfigured on the storage servers. Perform the following on the database servers:
Verify the InfiniBand network is the private network used for Oracle Clusterware communication with the following command:
$GI_HOME/bin/oifcfg getif -type cluster_interconnect
For X2-2 the output should be similar to:
bondib0  192.168.8.0  global  cluster_interconnect
For X2-8 the output should be similar to:
bondib0  192.168.8.0  global  cluster_interconnect 
bondib1  192.168.8.0  global  cluster_interconnect 
bondib2  192.168.8.0  global  cluster_interconnect 
bondib3  192.168.8.0  global  cluster_interconnect
If the InfiniBand network is not the private network used for Oracle Clusterware communication, configure it following the instructions in MOS note 1073502.1, "How to Modify Private Network Interface in 11.2 Grid Infrastructure".
NOTE: It is important to ensure that your public interface is properly marked as public and not private. This can be checked with the oifcfg getif command. If it is inadvertantly marked private, you can get errors such as "OS system dependent operation:bind failed with status" and "OS failure message: Cannot assign requested address". It can be corrected with a command like oifcfg setif -global eth0/:public
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => InfiniBand is the Private Network for Oracle Clusterware Communication


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY INFINIBAND IS THE PRIVATE NETWORK FOR ORACLE CLUSTERWARE COMMUNICATION



bondib0 = InfiniBand

Status on dm01db02: PASS => InfiniBand is the Private Network for Oracle Clusterware Communication


bondib0 = InfiniBand
Top

Top

Verify storage server disk controllers use writeback cache

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Database servers use an internal RAID controller with a battery-backed cache to host local filesystems. For maximum performance when writing I/O to local disks, the battery-backed cache should be in "WriteBack" mode.

The impact of configuring the battery-backed cache in "WriteBack" mode is minimal.

Risk:

Not configuring the battery-backed cache in "WriteBack" mode will result in degraded performance when writing I/O to the local database server disks.

Action / Repair:

To verify that the database server disk controller battery-backed cache is in "WriteBack" mode, run the following command:

# /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -a0 | grep -i writethrough
# 

There should be no output returned.

If the battery-backed cache is not in "WriteBack" mode, run these commands on the database server to place the battery-backed cache into "WriteBack" mode:

/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WB  -Lall  -a0
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NoCachedBadBBU -Lall  -a0
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NORA -Lall  -a0
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp Direct -Lall  -a0

    NOTE: No settings should be modified on Exadata storage cells. The mode described above applies only to database servers in an Exadata database machine. 
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => All storage server disk controllers use writeback cache


DATA FROM DM01CEL01 FOR VERIFY STORAGE SERVER DISK CONTROLLERS USE WRITEBACK CACHE



Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU




DATA FROM DM01CEL02 FOR VERIFY STORAGE SERVER DISK CONTROLLERS USE WRITEBACK CACHE
...More
Top

Top

Configure Storage Server alerts to be sent via email

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Oracle Exadata Storage Servers can send various levels of alerts and clear messages via email or snmp, or both. Sending these messages via email at a minimum helps to ensure that a problem is detected and corrected.

There is little impact to storage server operation to send these messages via email.

Risk:

If the storage servers are not configured to send alerts and clear messages via email at a minimum, there is an increased risk of a problem not being detected in a timely manner.

Action / Repair:

Configure a storage server to send email alerts using the following cellcli command (tailored to your environment):

ALTER CELL smtpServer='mailserver.maildomain.com',           -
smtpFromAddr='firstname.lastname@maildomain.com',        -
smtpToAddr='firstname.lastname@maildomain.com',        -
smtpFrom='Exadata cell', -
smtpPort='', -
smtpUseSSL='TRUE', -
notificationPolicy='critical,warning,clear',  -
notificationMethod='mail';

Use the following cellcli command to validate the email configuration by sending a test email:

alter cell validate mail;

    NOTE: The recommended best practice to monitor an Oracle Exadata Database Machine is with Oracle Enterprise Manager (OEM) and the suite of OEM plugins developed for the Oracle Exadata Database Machine. Please reference My Oracle Support (MOS) note 1110675.1 for details. 
 
Needs attention ondm01cel03, dm01cel02, dm01cel01
Passed on-

Status on dm01cel03, dm01cel02, dm01cel01: FAIL => Storage Server alerts are not configured to be sent via email


DATA FROM DM01CEL01 FOR CONFIGURE STORAGE SERVER ALERTS TO BE SENT VIA EMAIL







DATA FROM DM01CEL02 FOR CONFIGURE STORAGE SERVER ALERTS TO BE SENT VIA EMAIL







DATA FROM DM01CEL03 FOR CONFIGURE STORAGE SERVER ALERTS TO BE SENT VIA EMAIL



...More
Top

Top

Exadata celldisk predictive failures

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Any disks in predictive failure status should be attended to immediately. 
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => No celldisks have status of predictive failure


DATA FROM DM01CEL01 FOR EXADATA CELLDISK PREDICTIVE FAILURES




NOTE: Celldisk Name, Status, Size Attributes

CD_00_dm01cel01	 normal     	 1832.59375G
CD_01_dm01cel01	 normal     	 1832.59375G
CD_02_dm01cel01	 not present	 1861.703125G
CD_03_dm01cel01	 normal     	 1861.703125G
CD_04_dm01cel01	 normal     	 1861.703125G
CD_05_dm01cel01	 normal     	 1861.703125G
CD_06_dm01cel01	 normal     	 1861.703125G
CD_07_dm01cel01	 normal     	 1861.703125G
CD_08_dm01cel01	 normal     	 1861.703125G
CD_09_dm01cel01	 normal     	 1861.703125G
CD_10_dm01cel01	 normal     	 1861.703125G
CD_11_dm01cel01	 normal     	 1861.703125G
FD_00_dm01cel01	 normal     	 22.875G
...More
Top

Top

RAID controller version on storage servers

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 RAID controller version should match on all storage cells. use following command to find out the RAID controller version on storage cell.

/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -a0  | grep 'FW Package Build' |awk -F: '{print $2}'
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => RAID controller version matches on all storage servers


DATA FROM DM01CEL01 FOR RAID CONTROLLER VERSION ON STORAGE SERVERS




NOTE: Look for FW Package Build


Adapter #0

==============================================================================
Versions
================
Product Name    : LSI MegaRAID SAS 9261-8i
Serial No       : SV10914745
FW Package Build: 12.12.0-0079

Mfg. Data
================
Mfg. Date       : 03/02/11
...More
Top

Top

RDBMS Version

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 The RBMS version for X2-2 is expected to be 11.2.0.2 or higher.
 
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => RDBMS Version is 11.2.0.2 or higher as expected


DATA FOR MCSDB FOR RDBMS VERSION




RDBMS Version = 11.2.0.3.0
Top

Top

Locally managed tablespaces

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 In order to reduce contention to the data dictionary, rollback data, and reduce the amount of generated redo, locally managed tablespaces should be used rather than dictionary managed tablespaces.
 
Needs attention on-
Passed onMCSDB

Status on MCSDB: PASS => All tablespaces are locally manged tablespace


DATA FOR MCSDB FOR LOCALLY MANAGED TABLESPACES




SYSTEM                         LOCAL
SYSAUX                         LOCAL
UNDOTBS1                       LOCAL
TEMP                           LOCAL
UNDOTBS2                       LOCAL
USERS                          LOCAL
MCSDW                          LOCAL
MCSODS                         LOCAL
IDX_MCSODS                     LOCAL
IDX_MCSDW                      LOCAL
MCSSTG                         LOCAL
MCSAPP                         LOCAL
INFO_BASE                      LOCAL

INFO_DATA                      LOCAL
...More
Top

Top

ASM Version

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The components in the I/O stack are tightly integrated in Exadata. You must use the proper versions of software both on the storage servers and the database servers.
There is minimal impact to verify the ASM version.

Risk:

If the ASM version is not correct, the database(s) will not be able to communicate with the storage servers via ASM.

Action / Repair:

Verify the Oracle ASM software is version 11.2.0.2 or higher
If it is not, correct the condition.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => ASM Version is 11.2.0.2 or higher as expected


DATA FROM DM01DB01 - MCSDB DATABASE - ASM VERSION



asm instance version = 11.2.0.3.0

Status on dm01db02: PASS => ASM Version is 11.2.0.2 or higher as expected


asm instance version = 11.2.0.3.0
Top

Top

Turn NUMA Off [Operating System]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

Database servers in Oracle Exadata Database Machine by default are booted with operating system NUMA support enabled. Commands that manipulate large files without using direct I/O on ext3 file systems will cause low memory conditions on the NUMA node (Xeon 5500 processor) currently running the process.

By turning NUMA off, a potential local node low memory condition and subsequent performance drop is avoided.

The impact of turning NUMA off is minimal.

Risk:

Once local node memory is depleted, system performance as a whole will be severely impacted.

Action / Repair:

Follow the instructions in the referenced MOS Note to turn NUMA off in the kernel for database serverss
 
Links
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => NUMA is OFF at operating system level.


DATA FROM DM01DB01 - MCSDB DATABASE - TURN NUMA OFF [OPERATING SYSTEM]




NOTE: Look for numa=off

root=LABEL=DBSYS ro bootarea=dbsys loglevel=7 panic=60 debug rhgb numa=off console=ttyS0,115200n8 console=tty1 crashkernel=128M@64M audit=1 processor.max_cstate=1 nomce
available: 1 nodes (0)
node 0 size: 98295 MB
node 0 free: 6919 MB
No distance information available.

Status on dm01db02: PASS => NUMA is OFF at operating system level.



NOTE: Look for numa=off

root=LABEL=DBSYS ro bootarea=dbsys loglevel=7 panic=60 debug rhgb numa=off console=ttyS0,115200n8 console=tty1 crashkernel=128M@64M audit=1 processor.max_cstate=1 nomce
available: 1 nodes (0)
node 0 size: 98295 MB
node 0 free: 9796 MB
No distance information available.
Top

Top

Verify database server InfiniBand network MTU size is 65520

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Setting the database server InfiniBand network Maximum Transmission Unit (MTU) to 65520 ensures a high transfer rate to external devices that are using TCP/IP over IB such as Media Servers / Data Mules or NFS Servers.
There is no impact to verifying the InfiniBand network MTU size. Changing the database server InfiniBand network MTU size requires a reboot.

Risk:

If the InfiniBand network MTU size is not set to 65520, performance when communicating with devices that use TCP/IP over InfiniBand will be sub-optimal.

Action / Repair:

To verify the database server InfiniBand network MTU size is 65520, use the following command:
for ib_intrfce in `ifconfig | grep ib | grep -v ":1" | cut -c1-7`; do printf "$ib_intrfce: "; ifconfig $ib_intrfce | grep MTU | cut -c49-58; done

For X2-2, the output will be similar to:

bondib0:  MTU:65520 
ib0: MTU:65520  
ib1: MTU:65520

For X2-8, the output will be similar to:

bondib0:  MTU:65520 
bondib1:  MTU:65520 
bondib2:  MTU:65520 
bondib3:  MTU:65520 
ib0: MTU:65520 
ib1: MTU:65520 
ib2: MTU:65520 
ib3: MTU:65520 
ib4: MTU:65520 
ib5: MTU:65520 
ib6: MTU:65520 
ib7: MTU:65520 

The expected output is that MTU size is 65520. If the MTU size is not 65520, correct it using the following steps:

1) Edit the appropriate files (for example, /etc/sysconfig/network-scripts/ifcfg-bondib0, /etc/sysconfig/network-scripts/ifcfg-ib0, and /etc/sysconfig/network-scripts/ifcfg-ib1 for the bondib0 interface) and add an entry for MTU=65520. For example:
MTU=65520

2) Reboot the database server and verify the InfiniBand network MTU size is 65520.
NOTE: The InfiniBand? fabric must be in "connected" mode before MTU size can be set.
NOTE: Devices that communicate with the database server using TCP/IP over InfiniBand should also be configured with an MTU size of 65520. Both the device and the database server must use the same MTU setting to send transmission units of that size.
NOTE: The MTU Size for the IB Interfaces on the Exadata Cells should be left unchanged at 1500 because the communication between the DB Nodes and the Exadata Cells uses RDS and not IP protocols.
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database Server InfiniBand network MTU size is 65520


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY DATABASE SERVER INFINIBAND NETWORK MTU SIZE IS 65520




NOTE: Look for MTU:

bondib0   Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.10.1  Bcast:192.168.11.255  Mask:255.255.252.0
inet6 addr: fe80::221:2800:1cf:4887/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1

Status on dm01db02: PASS => Database Server InfiniBand network MTU size is 65520



NOTE: Look for MTU:

bondib0   Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.10.2  Bcast:192.168.11.255  Mask:255.255.252.0
inet6 addr: fe80::221:2800:1cf:4863/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
Top

Top

Verify Oracle RAC Databases use RDS Protocol over InfiniBand Network. [Cluster Home]

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

RDS protocol provides superior performance for Oracle RAC communication because it does not require additional memory buffering operations when moving data from process memory to the network interface for RAC inter-node communication.

There is minimal impact to verify that RDS protocol is in use. If it is not, implementing it will require a relink of the Oracle software, which requires an outage.

Risk:

If the Oracle RAC databases do not use RDS protocol over the InfiniBand network, RAC inter-node communication will be sub-optimal.

Action / Repair:

Perform the following steps on the database servers:

   1. Verify an Oracle RAC database is using the RDS protocol over the InfiniBand Network by checking all alert logs on all nodes: 

Cluster communication is configured to use the following interface(s) for this instance 192.168.20.21 cluster interconnect IPC version:Oracle RDS/IP (generic) 

   1. If it is not running RDS, relink the Oracle binary via the following:

    * (as oracle) Shutdown any process using the Oracle binary
    * (as root) GRID_HOME/crs/install/rootcrs.pl -unlock **only required if relinking in the Grid Infrastructure home**
    * (as oracle) cd $ORACLE_HOME/rdbms/lib
    * (as oracle) make -f ins_rdbms.mk ipc_rds ioracle
    * (as root) GRID_HOME/crs/install/rootcrs.pl -patch only required if relinking in the Grid Infrastructure home 

    Note: Avoid using the relink all command due to various issues. Use the make commands as seen above when relinking oracle binaries.

    Note: The dcli utility can be used to double check that all nodes are configured consistently with respect to the skgxp libraries in use. An example command to perform this check is

    dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l oracle md5sum ${ORACLE_HOME}/lib/libskgxp11.so 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Clusterware Home is properly linked with RDS library


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY ORACLE RAC DATABASES USE RDS PROTOCOL OVER INFINIBAND NETWORK. [CLUSTER HOME]




NOTE: md5sum should match for Clusterware to properly use RDS Protocol for Infiniband

352f95d3be1aed7796f11e7cfa45699d  /u01/app/11.2.0.3/grid/lib/libskgxp11.so
352f95d3be1aed7796f11e7cfa45699d  /u01/app/11.2.0.3/grid/lib/libskgxpr.so

Status on dm01db02: PASS => Clusterware Home is properly linked with RDS library



NOTE: md5sum should match for Clusterware to properly use RDS Protocol for Infiniband

352f95d3be1aed7796f11e7cfa45699d  /u01/app/11.2.0.3/grid/lib/libskgxp11.so
352f95d3be1aed7796f11e7cfa45699d  /u01/app/11.2.0.3/grid/lib/libskgxpr.so
Top

Top

Verify Cluster Synchronization Services (CSS) misscount = 60

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

By default in 11.2, the CSS misscount is 30 for new Oracle Clusterware Repository (OCR) installations built at install time (not a previous OCR that has been upgraded to 11.2), however onecommand sets misscount to 60 at deployment time. 60 is a conservative setting and the current recommended default for Database Machine. For mission critical applications that require lower brownouts after a database node failure, 30 may be used. Misscount should not be set lower than 30 seconds.
The impact of setting CSS misscount to 60 is minimal.

Risk:

Setting misscount lower than 60 creates a higher risk that an unexpected temporary stall (ex: caused by component failure or performance degredation) results in a node eviction.

Action / Repair:

To verify that the CSS misscount is set to 60, execute the following command as the "root" userid on a database server, substituting your environment specific path for :
/bin/crsctl get css misscount
The output will be similar to:
CRS-4678: Successful get misscount 60 for Cluster Synchronization Services
The expected output is that the CSS misscount is set to 60.
If it is not, set it executing the following command as the "root" userid on a database server (takes effect on all nodes in a cluster without an Oracle Clusterware restart), substituting your environment specific path for :
/bin/crsctl set css misscount 60
The output will be similar to:
CRS-4684: Successful set of parameter misscount to 60 for Cluster Synchronization Services

NOTE: Prior to 11.2, the CSS diagwait parameter was set to aid in debugging by ensuring log files had timely information. In 11.2, this parameter is no longer required and should not be set.
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => CSS misscount is set to the recommended value of 60


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY CLUSTER SYNCHRONIZATION SERVICES (CSS) MISSCOUNT = 60



CRS-4678: Successful get misscount 60 for Cluster Synchronization Services.
Top

Top

Check for parameter _lm_rcvr_hang_allow_time

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

Database initialization parameter _lm_rcvr_hang_allow_time = 140 protects from corner case timeouts lower in the stack and prevents instance evictions
 
Needs attention onMCSDB1, MCSDB2
Passed on-

Status on MCSDB1: FAIL => Database parameter _lm_rcvr_hang_allow_time is NOT set to the recommended value

_lm_rcvr_hang_allow_time = 70                                                   

Status on MCSDB2: FAIL => Database parameter _lm_rcvr_hang_allow_time is NOT set to the recommended value

_lm_rcvr_hang_allow_time = 70                                                   
Top

Top

Check for parameter _kill_diagnostics_timeout

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

Database initialization parameter _kill_diagnostics_timeout = 140 protects from corner case timeouts lower in the stack and prevents instance evictions
 
Needs attention onMCSDB1, MCSDB2
Passed on-

Status on MCSDB1: FAIL => Database parameter _kill_diagnostics_timeout is not set to recommended value

_kill_diagnostics_timeout = 60                                                  

Status on MCSDB2: FAIL => Database parameter _kill_diagnostics_timeout is not set to recommended value

_kill_diagnostics_timeout = 60                                                  
Top

Top

Verify There Are No Storage Server Memory (ECC) Errors

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Memory modules that have corrected Memory Errors (ECC) can show degraded performance, IPMI driver timeouts, and BMC error messages in /var/log/messages file.

Correcting the condition restores optimal performance.

The impact of checking for memory ECC errors is slight. Correction will likely require a firmware upgrade and reboot, or hardware repair downtime.

Risk:

If not corrected, the faulty memory will lead to performance degradation and other errors.

Action / Repair:

To check for memory ECC errors, run the following command as the root userid on the storage server:

# ipmitool sel list | grep ECC | cut -f1 -d : | sort -u

If any errors are reported, take the following actions in order:

   1. Upgrade to the latest BIOS as it addresses a potential cause.
   2. Reseat the DIMM.
   3. Open an SR for hardware replacement. 
 
Needs attention on-
Passed ondm01cel03, dm01cel02, dm01cel01

Status on dm01cel03, dm01cel02, dm01cel01: PASS => No Storage Server Memory (ECC) Errors found.


DATA FROM DM01CEL01 FOR VERIFY THERE ARE NO STORAGE SERVER MEMORY (ECC) ERRORS




NOTE: No output means no errors were found






DATA FROM DM01CEL02 FOR VERIFY THERE ARE NO STORAGE SERVER MEMORY (ECC) ERRORS




NOTE: No output means no errors were found


...More
Top

Top

Check for parameter db_block_checksum

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact: 

Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized.
The parameters are common to all database instances. The impact of setting these parameters is minimal.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.

Risk: 

If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.

Action / Repair: 

DB_BLOCK_CHECKSUM = FULL aids in block corruption detection. Enable for primary and standby databases
 
Links
Needs attention onMCSDB1, MCSDB2
Passed on-

Status on MCSDB1: FAIL => Database parameter DB_BLOCK_CHECKSUM is NOT set to recommended value

MCSDB1.db_block_checksum = typical                                              

Status on MCSDB2: FAIL => Database parameter DB_BLOCK_CHECKSUM is NOT set to recommended value

MCSDB2.db_block_checksum = typical                                              
Top

Top

Verify the database server InfiniBand network is in "connected" mode.

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

Having the database server InfiniBand network in "connected" mode increases performance when communicating with devices that use TCP/IP over InfiniBand.

There is no impact to verifying the InfiniBand network mode. Change the InfiniBand network mode requires a reboot.

NOTE: Devices that communicate with the database server using TCP/IP over InfiniBand should also be configured in "connected" mode. Both the device and the database server must be in "connected" mode to achieve the performance benefit.

Risk:

If the database server InfiniBand network is not in "connected" mode, the performance of devices using TCP/IP over InfiniBand to communicate with the database server will be sub-optimal.

Action / Repair:

To verify that the database server InfiniBand network is in "connnected" mode, use the following command:

# grep [a-z] /sys/class/net/ib*/mode
/sys/class/net/ib0/mode:connected
/sys/class/net/ib1/mode:connected

The expected output is that both ib0 and ib1 report "connected" in the output. If it is not, correct it using the following steps:

1) Edit the /etc/ofed/openib.conf file, search for the line specifying SET_IPOIB_CM, and change its value to "yes". For example:

# Enable IPoIB Connected Mode
SET_IPOIB_CM=yes

2) Reboot the database server and verify that the InfiniBand network is in "connected" mode. 
 
Needs attention on-
Passed ondm01db01, dm01db02

Status on dm01db01: PASS => Database server InfiniBand network is in "connected" mode.


DATA FROM DM01DB01 - MCSDB DATABASE - VERIFY THE DATABASE SERVER INFINIBAND NETWORK IS IN "CONNECTED" MODE.



/sys/class/net/ib0/mode:connected
/sys/class/net/ib1/mode:connected

Status on dm01db02: PASS => Database server InfiniBand network is in "connected" mode.


/sys/class/net/ib0/mode:connected
/sys/class/net/ib1/mode:connected
Top

Top

ASM disk group compatible.asm parameter

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The components in the I/O stack are tightly integrated in Exadata. You must use the proper versions of software both on the storage servers and the database servers. Setting compatible attributes defines available functionality. Setting CELL.SMART_SCAN_CAPABLE enables the offloading of certain query work to the storage servers. Setting AU_SIZE maximizes available disk technology and throughput by reading 4MB of data before performing a disk seek to a new sector location.
There is minimal impact to verify and configure these settings.

Risk:

If these attributes are not set as directed, performance will be sub-optimal.

Action / Repair:

For the ASM disk group containing Oracle Exadata Storage Server grid disks, 
verify the attribute settings as follows: 

     * COMPATIBLE.ASM attribute is set to 11.2.0.2 or higher. 
     * COMPATIBLE.RDBMS attribute is set to the minimum Oracle database software version in use. 
     * CELL.SMART_SCAN_CAPABLE attribute is TRUE. 
     * AU_SIZE attribute is 4M. 
if these attributes are not set properly, correct the condition. 
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => All disk groups have compatible.asm parameter set to recommended values


DATA FROM DM01DB01 - MCSDB DATABASE - ASM DISK GROUP COMPATIBLE.ASM PARAMETER



ASM DATA_DM01.compatible.asm = 11.2.0.3.0
ASM DBFS_DG.compatible.asm = 11.2.0.3.0
ASM RECO_DM01.compatible.asm = 11.2.0.3.0
Top

Top

ASM Cell smart scan

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Critical

Benefit / Impact:

The components in the I/O stack are tightly integrated in Exadata. You must use the proper versions of software both on the storage servers and the database servers. Setting compatible attributes defines available functionality. Setting CELL.SMART_SCAN_CAPABLE enables the offloading of certain query work to the storage servers. Setting AU_SIZE maximizes available disk technology and throughput by reading 4MB of data before performing a disk seek to a new sector location.
There is minimal impact to verify and configure these settings.

Risk:

If these attributes are not set as directed, performance will be sub-optimal.

Action / Repair:

For the ASM disk group containing Oracle Exadata Storage Server grid disks, verify the attribute settings as follows:
COMPATIBLE.ASM attribute is set to the Oracle ASM software version in use.
COMPATIBLE.RDBMS attribute is set to the Oracle database software version in use.
CELL.SMART_SCAN_CAPABLE attribute is TRUE.
AU_SIZE attribute is 4M.
If these attributes are not set properly, correct the condition.
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => All disk groups have CELL.SMART_SCAN_CAPABLE parameter set to true


DATA FROM DM01DB01 - MCSDB DATABASE - ASM CELL SMART SCAN



ASM DATA_DM01.cell.smart_scan_capable = TRUE
ASM DBFS_DG.cell.smart_scan_capable = TRUE
ASM RECO_DM01.cell.smart_scan_capable = TRUE
Top

Top

ASM disk group compatible.rdbms parameter

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

The components in the I/O stack are tightly integrated in Exadata. You must use the proper versions of software both on the storage servers and the database servers. Setting compatible attributes defines available functionality. Setting CELL.SMART_SCAN_CAPABLE enables the offloading of certain query work to the storage servers. Setting AU_SIZE maximizes available disk technology and throughput by reading 4MB of data before performing a disk seek to a new sector location.

There is minimal impact to verify and configure these settings.

Risk:

If these attributes are not set as directed, performance will be sub-optimal.

Action / Repair:

For the ASM disk group containing Oracle Exadata Storage Server grid disks, 
verify the attribute settings as follows: 

     * COMPATIBLE.ASM attribute is set to the Oracle ASM software version in use. 
     * COMPATIBLE.RDBMS attribute is set to the minimum Oracle database software version in use. 
     * CELL.SMART_SCAN_CAPABLE attribute is TRUE. 
     * AU_SIZE attribute is 4M. 

if these attributes are not set properly, correct the condition.

NOTE:  If the compatible.rdbms diskgroup parameter is higher than any database instance version than that instance can not access those higher compatibility disk groups .  If compatible.rdbms is not set properly, then the rdbms software version and database must be upgraded or the diskgroup must be recreated.
 
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => All disk groups have compatible.rdbms parameter set to recommended values


DATA FROM DM01DB01 - MCSDB DATABASE - ASM DISK GROUP COMPATIBLE.RDBMS PARAMETER



lowest rdbms version = 112030
ASM DATA_DM01.compatible.rdbms = 11.2.0.2.0
ASM DBFS_DG.compatible.rdbms = 11.2.0.2.0
ASM RECO_DM01.compatible.rdbms = 11.2.0.2.0
Top

Top

ASM allocation unit size for all disk groups

Success FactorDBMACHINE X2-2 AND X2-8 AUDIT CHECKS
Recommendation
 Benefit / Impact:

In order to achieve fast disk scan rates with today's disk technology, it is important that segments be laid out on disk with at least 4MB of contiguous disk space. This allows disk scans to read 4MB of data from disk before having to perform a seek to another location on disk and therefore ensures that most of the time during a scan is spent transferring data from disk.

Risk:

Time could be spent seeking between disk locations for data. 

Action / Repair:

To ensure that segments are layed out with 4MB of contiguous data on disk, you will need to set the ASM allocation unit (AU) size to 4MB and ensure that data file extents are at least 4MB in size. The ASM allocation unit can be specified when a disk group is created. For Exadata, we recommend setting the AU size to 4MB. The ASM allocation unit size (AU_SIZE) can be set at disk group creation time as can be seen in the following example:

CREATE diskgroup data normal redundancy 
DISK 'o/*/DATA*'
ATTRIBUTE 
          'AU_SIZE' = '4M',
          'cell.smart_scan_capable'='TRUE',
          'compatible.rdbms'='11.2.0.0', 
          'compatible.asm'='11.2.0.0';
 
Needs attention on-
Passed ondm01db01

Status on dm01db01: PASS => All disk groups have allocation unit size set to 4MB


DATA FROM DM01DB01 - MCSDB DATABASE - ASM ALLOCATION UNIT SIZE FOR ALL DISK GROUPS



ASM DATA_DM01.au_size = 4194304
ASM DBFS_DG.au_size = 4194304
ASM RECO_DM01.au_size = 4194304
Top

Top

Systemwide firmware and software versions

Please compare these versions against Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions (Doc ID 888828.1) in MyOracle Support

Database server dm01db01



Clusterware and RDBMS software version

dm01db01.CRS_ACTIVE_VERSION = 11.2.0.3.0
dm01db01.MCSDB.INSTANCE_VERSION = 112030


Clusterware home(/u01/app/11.2.0.3/grid) patch inventory
Patch  14275572     : applied on Wed Nov 28 10:45:33 CST 2012
Patch  14307915     : applied on Wed Nov 28 10:46:19 CST 2012
Patch  14474780     : applied on Wed Nov 28 10:44:21 CST 2012
Patch description:  "QUARTERLY CRS PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14275572)"
Patch description:  "QUARTERLY DATABASE PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14474780)"
Patch description:  "QUARTERLY DISKMON PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14307915)"

Exadata Server software version
version:11.2.3.2.0.120713

Infiniband HCA firmware version
Firmware version: 2.7.8130

OpenFabrics Enterprise Distribution (OFED) Software version
1.5.1
1.5.1
1.5.1
1.5.1
1.5.1

Operating system and Kernel version
Red Hat Enterprise Linux Server release 5.8 (Tikanga) kernel=2.6.32-400.1.1.el5uek

Database server dm01db01


RDBMS home(/u01/app/oracle/product/11.2.0.3/dbhome_1) patch inventory

Patch  14275572     : applied on Wed Nov 28 11:23:12 CST 2012
Patch  14307915     : applied on Wed Nov 28 11:23:28 CST 2012
Patch  14474780     : applied on Wed Nov 28 11:22:20 CST 2012
Patch description:  "QUARTERLY CRS PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14275572)"
Patch description:  "QUARTERLY DATABASE PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14474780)"
Patch description:  "QUARTERLY DISKMON PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14307915)"

Database server dm01db02



Clusterware and RDBMS software version

dm01db02.CRS_ACTIVE_VERSION = 11.2.0.3.0
dm01db02.MCSDB.INSTANCE_VERSION = 112030


Clusterware home(/u01/app/11.2.0.3/grid) patch inventory
Patch  14275572     : applied on Wed Nov 28 10:48:03 CST 2012
Patch  14307915     : applied on Wed Nov 28 10:48:50 CST 2012
Patch  14474780     : applied on Wed Nov 28 10:46:50 CST 2012
Patch description:  "QUARTERLY CRS PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14275572)"
Patch description:  "QUARTERLY DATABASE PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14474780)"
Patch description:  "QUARTERLY DISKMON PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14307915)"

Exadata Server software version
version:11.2.3.2.0.120713

Infiniband HCA firmware version
Firmware version: 2.7.8130

OpenFabrics Enterprise Distribution (OFED) Software version
1.5.1
1.5.1
1.5.1
1.5.1
1.5.1

Operating system and Kernel version
Red Hat Enterprise Linux Server release 5.8 (Tikanga) kernel=2.6.32-400.1.1.el5uek

Database server dm01db02


RDBMS home(/u01/app/oracle/product/11.2.0.3/dbhome_1) patch inventory

Patch  14275572     : applied on Wed Nov 28 11:23:15 CST 2012
Patch  14307915     : applied on Wed Nov 28 11:23:32 CST 2012
Patch  14474780     : applied on Wed Nov 28 11:22:21 CST 2012
Patch description:  "QUARTERLY CRS PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14275572)"
Patch description:  "QUARTERLY DATABASE PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14474780)"
Patch description:  "QUARTERLY DISKMON PATCH FOR EXADATA (OCT 2012 - 11.2.0.3.11) : (14307915)"

Storage server dm01cel01


Exadata Server software version
version:11.2.3.2.0.120713

Infiniband HCA firmware version
Firmware version: 2.7.8130

OpenFabrics Enterprise Distribution (OFED) Software version
1.5.1

Operating system and Kernel version
Red Hat Enterprise Linux Server release 5.8 (Tikanga) kernel=2.6.32-400.1.1.el5uek

Storage server dm01cel02


Exadata Server software version
version:11.2.3.2.0.120713

Infiniband HCA firmware version
Firmware version: 2.7.8130

OpenFabrics Enterprise Distribution (OFED) Software version
1.5.1

Operating system and Kernel version
Red Hat Enterprise Linux Server release 5.8 (Tikanga) kernel=2.6.32-400.1.1.el5uek

Storage server dm01cel03


Exadata Server software version
version:11.2.3.2.0.120713

Infiniband HCA firmware version
Firmware version: 2.7.8130

OpenFabrics Enterprise Distribution (OFED) Software version
1.5.1

Operating system and Kernel version
Red Hat Enterprise Linux Server release 5.8 (Tikanga) kernel=2.6.32-400.1.1.el5uek

Top

Skipped Checks

skipping Infiniband switch HOSTNAME configuration (checkid:- 9AD56124DDFE9FCCE040E50A1EC038A6) on dm01sw-ib3 because s_sysconfig_network_dm01sw-ib3.out not found
skipping Infiniband switch HOSTNAME configuration (checkid:- 9AD56124DDFE9FCCE040E50A1EC038A6) on dm01sw-ib2 because s_sysconfig_network_dm01sw-ib2.out not found
skipping Infiniband Switch NTP configuration (checkid:- 9AD59DE0898D0513E040E50A1EC03EEA) on dm01sw-ib3 because s_ntp_dm01sw-ib3.out not found
skipping Infiniband Switch NTP configuration (checkid:- 9AD59DE0898D0513E040E50A1EC03EEA) on dm01sw-ib2 because s_ntp_dm01sw-ib2.out not found
skipping Infiniband switch sminfo_polling_timeout configuration (checkid:- 9AD8CC2B50B63DEBE040E50A1EC0529A) on dm01sw-ib3 because s_opensm_dm01sw-ib3.out not found
skipping Infiniband switch sminfo_polling_timeout configuration (checkid:- 9AD8CC2B50B63DEBE040E50A1EC0529A) on dm01sw-ib2 because s_opensm_dm01sw-ib2.out not found
skipping Infiniband switch routing_engine configuration (checkid:- 9AD8F72CFE0AC95BE040E50A1EC050D0) on dm01sw-ib3 because s_opensm_dm01sw-ib3.out not found
skipping Infiniband switch routing_engine configuration (checkid:- 9AD8F72CFE0AC95BE040E50A1EC050D0) on dm01sw-ib2 because s_opensm_dm01sw-ib2.out not found
skipping sm_priority configuration on Infiniband switch (checkid:- 9AD95A48A426E029E040E50A1EC062A1) on dm01sw-ib3 because s_sm_priority_status_dm01sw-ib3.out not found
skipping sm_priority configuration on Infiniband switch (checkid:- 9AD95A48A426E029E040E50A1EC062A1) on dm01sw-ib2 because s_sm_priority_status_dm01sw-ib2.out not found
skipping Infiniband switch log_flags configuration (checkid:- 9ADA623709086DC5E040E50A1EC0168D) on dm01sw-ib3 because s_opensm_dm01sw-ib3.out not found
skipping Infiniband switch log_flags configuration (checkid:- 9ADA623709086DC5E040E50A1EC0168D) on dm01sw-ib2 because s_opensm_dm01sw-ib2.out not found
skipping Infiniband subnet manager status (checkid:- 9ADA9729FCD46EBBE040E50A1EC02350) on dm01sw-ib3 because s_opensmd_status_dm01sw-ib3.out not found
skipping Infiniband subnet manager status (checkid:- 9ADA9729FCD46EBBE040E50A1EC02350) on dm01sw-ib2 because s_opensmd_status_dm01sw-ib2.out not found
skipping Infiniband switch controlled_handover configuration (checkid:- 9ADAAD73FF532FE4E040E50A1EC0284E) on dm01sw-ib3 because s_opensm_dm01sw-ib3.out not found
skipping Infiniband switch controlled_handover configuration (checkid:- 9ADAAD73FF532FE4E040E50A1EC0284E) on dm01sw-ib2 because s_opensm_dm01sw-ib2.out not found
skipping Infiniband switch polling_retry_number configuration (checkid:- 9ADAAF3071391E94E040E50A1EC028AF) on dm01sw-ib3 because s_opensm_dm01sw-ib3.out not found
skipping Infiniband switch polling_retry_number configuration (checkid:- 9ADAAF3071391E94E040E50A1EC028AF) on dm01sw-ib2 because s_opensm_dm01sw-ib2.out not found
skipping Switch firmware version (checkid:- B0A0A6141D1A39CCE0431EC0E50AB237) on dm01sw-ib3 because s_nm2version_dm01sw-ib3.out not found
skipping Switch firmware version (checkid:- B0A0A6141D1A39CCE0431EC0E50AB237) on dm01sw-ib2 because s_nm2version_dm01sw-ib2.out not found
skipping Hostname in /etc/hosts (checkid:- B0A4363CC03E5EA3E0431EC0E50A3489) on dm01sw-ib3 because s_etc_hostname_dm01sw-ib3.out not found
skipping Hostname in /etc/hosts (checkid:- B0A4363CC03E5EA3E0431EC0E50A3489) on dm01sw-ib2 because s_etc_hostname_dm01sw-ib2.out not found
skipping Verify average ping times to DNS nameserver (checkid:- B81546A46C376C14E0431EC0E50A826D) on dm01sw-ib3 because s_dns_ping_time_dm01sw-ib3.out not found
skipping Verify average ping times to DNS nameserver (checkid:- B81546A46C376C14E0431EC0E50A826D) on dm01sw-ib2 because s_dns_ping_time_dm01sw-ib2.out not found