Ranjit's Oracle Blogs: Oracle 10g Real Application Cluster Architecture

Oracle RAC, Introduced with Oracle9i in 2001, 10g was in 2004 – provides Software for Clustering and High availability in Oracle database environments. It is the successor to Oracle Parallel Server (OPS).

What is Cluster?

A cluster consists of two or more independent, but interconnected, Server.

The common feature of a cluster is that it should appear to an application as if it were a single server.

The Interconnect is physical network used as a means of communication between each node of the cluster.

In short we can say, a cluster is a group of independent servers that cooperate as a single system.

· Multiple Servers (nodes) act as a Single "Clustered" server.

What is RAC?

Real Application Cluster is software that enables you to use cluster hardware by running multiple (one or more) instances against the same database. The database files are stored on disks that are either physically or logically connected to each other, so that every active instance can read from or write to database files.

RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.

· Multiple Database Instances (one on each server) accessing the same database over shared storage.

What is the Use RAC?

High availability: Survive node and instance failure.

No Scalability limits: Add more nodes as you need them future.

Pay as you grow: pay for just what you need today.

Key grid computing feature: grow and shrink on demand, single-button addition and removal of servers and automatic workload management for services.

What is Oracle Clusterware?

Oracle clusterware is the cross platform cluster software required to run RAC option for oracle database. And it provides basic clustering services at Operating system level that enable oracle software to run in clustering mode.

Oracle clusterware enables the nodes to communicate with each other, and forms the cluster and makes the nodes as a single logical server.

Oracle RAC utilizes Oracle Clusterware for the inter-node communication required in clustered database environments, Oracle Clusterware is the technology that transforms a server farm into a cluster.

A cluster in general is a group of independent servers that cooperate as a single system.

What is Cluster Ready Services (CRS)?

Oracle clusterware is run by cluster ready services (CRS), CRS was introduced in 10g and CRS run out of “inittab” on Unix/Linux and on windows as services ".exe".
The CRS (Cluster Ready Services) provides a standard cluster interface on all platforms and performs new high availability operations not available in previous versions.

CRS manages cluster database functions including node membership, group services, global resource management, and high availability.

CRS serves as the clusterware software for all platforms; it can be the only clusterware run on top of vendor clusterware such as Sun Cluster, HP Service guard, etc.

Oracle Clusterware (Cluster Ready Services in 10g/ Cluster Manager in 9i) - provides infrastructure that binds multiple nodes that then operate as single server.

Clusterware monitors all components like instances and listeners etc.

CRS manages the following resources:

-The ASM instances on each node

-Databases

-The instances on each node

-Oracle Services on each node

-The cluster nodes themselves, including the following processes, or "nodeapps":

-VIP (Virtual IP)

-GSD(Global Service Daemon)

-The listener

-The ONS daemon (Oracle Notification Service)

There are three main background processes you can see when doing a ps –ef|grep d.bin.

They are normally started by init during the operating system boot process. They can be started and stopped manually by issuing the command /etc/init.d/init.crs {start|stop|enable|disable}.
/etc/rc.d/init.d/init.crsd
/etc/rc.d/init.d/init.cssd
/etc/rc.d/init.d/init.evmd

The above CRS background processes are running, the crs_stat command and crsstat script will work and If they do not work, the crs_stat,crsstat will not work.

Once the above processes are running, they will automatically start the following services in the following order if they are enabled.
-The nodeapps (gsd, VIP, ons, listener) are brought online.
-The ASM instances are brought online.
-The database instances are brought online.
Any defined services are brought online.

See the below Image for RAC Software Principles.

CRS Components:

CRS Daemon (CRSD)

Oracle Cluster Synchronization service Daemon (OCSSD)

Event Volume Manager Daemon (EVMD)

Process Monitor Daemon (OPROCd)

CRSD:

CRSd manages the resources like starting and stopping the services and failing-over the application resources.

The CRSd run as the “supper user” (root user) in UNIX and runs as a Service (‘LocalSystem’) in Windows platform and automatically restarts in case of failure.

In reboot mode it ‘auto’ starts all the resources under its management.
In restart mode it prevails the previous state and brings back the resources to its previous state before shutdown.

It Manages the Oracle Cluster Registry and stores the current known state in the Oracle Cluster Registry.

CRS requires the public interface, private interface and the ‘Virtual IP’(VIP) for the operation, all these interfaces should be up and running should be able to ping each other before starting CRS Installation. Without the above network infrastructure CRS cannot be installed.

* The Failure or death of CRS Daemon can cause the node failure and automatically reboot the nodes to
avoid the data corruption because of the possible communication failure between the nodes.

In short,
- Engine for HA operation

- Manages 'application resources'

- Starts, stops, and fails 'application resources' over

- Spawns separate 'actions' to start/stop/check application resources

- Maintains configuration profiles in the OCR (Oracle Configuration Repository)

- Stores current known state in the OCR.

- Runs as root

- Is restarted automatically on failure
OCSSd:

Oracle Cluster Synchronization Services Daemon (ocssd) is the component, which provides the synchronization services between the nodes.
OCSSd provides the access to the node membership, It also enables basic cluster services including cluster group services and cluster locking. And It can also run without integration with vendor clusterware.

The Failure of ocssd causes the machine to reboot to avoid split-brain situation.
This is also required in a single instance configuration if Automatic Storage Management (ASM) is used.

OCSSd runs as a “ORACLE User”

OCSSd Notifies Members when a node joins and leaves the cluster.

It uses OCR to store the data and updates the information during reconfiguration.
OCSSd is part of RAC and Single Instance with ASM
In short,
- Provides access to node membership
- Provides group services

- Provides basic cluster locking
- Integrates with existing vendor clusteware, when present
- Can also runs without integration to vendor clustware

- Runs as Oracle.

- Failure exit causes machine reboot.

- This is a feature to prevent data corruption in event of a split brain.
EVMd:

The third component in oracle cluster service is called Event Management Logger.
It Monitor the message flow between the nodes and logs that relevant event information to files.

The Event Manager Daemon (EVMD) is an event-forwarding daemon process that propagates events through the Oracle Notification Service (ONS).

EVMD is the communication bridge between the Cluster-Ready Service Daemon (CRSD) and CSSD. All communications between the CRS and CSS happen via the EVMD.

Event Management logger also runs as daemon process ‘evmd’. The daemon process ‘evmd’ spawns a permanent child process called ‘evmlogger’ and generates the events when things happen. EVMD child process ‘evmlogger’ spawns new children processes on demand and scans the callout directory to invoke callouts.

It will restart automatically on failures and death of the evmd process does not halt the instance.

It runs as “oracle user”

- Generates events when things happen

- Scans callout directory and invokes callouts.

- Restarted automatically on failure

OPROCd:

Oprocd provides the server fencing solution for the Oracle Clusterware, It is the process monitor for the oracle clusterware and it uses the hang check timer or watchdog timer (depending on the implementation) for the cluster integrity.

Oprocd is locked in the memory and runs as a real time process.

This sleeps for a fixed time and runs as ‘root’ user.

Failure of the Oprocd process causes the node to restart.

RACGIMON is a database health check monitor and performs the tasks of starting, stopping, and failover services. It monitors the instances by reading a memory-mapped location in the SGA that is updated by the PMON process on all nodes. There is only one instance of the RACGIMON process for the entire cluster, and when the node that houses it fails, the RAGIMON process is started on the MASTER node of the surviving nodes by the CRS process.

We have two main components of CRS, that are Oracle Cluster Repository (OCR) and the Voting Disk.

Oracle Cluster Registry (OCR)

The OCR maintains the cluster configuration information as well as configuration information about any cluster database with the cluster. It including node list, node membership information of the cluster.

Some of the main components included in the OCR are:

--Node list, node membership information.

-- Database instance, node, and other mapping information and ASM (if configured)

--Application resource profiles such as VIP addresses, services, etc.

--Service characteristics

--Information about processes that Oracle Clusterware controls

--Information about any third-party applications controlled by CRS (10g R2 and later)

Note:

Oracle Clusterware manages CRS resources (database, instance, service, listener, VIPs and application process) based on the resource configuration information that is stored in Oracle Cluster Registry (OCR).

* To view the contents of the OCR in a human-readable format, run the ocrdump command. This will dump the contents of the OCR into an ASCII text file in the current directory named OCRDUMPFILE.*

The OCR must reside on a shared disk(s) that is accessible by all of the nodes in the cluster. Oracle Clusterware 10g Release2 allows you to multiplex the OCR.

* The OCR can be stored on a raw device or on a cluster file system. It cannot, however, be stored on an Automatic Storage Management (ASM) file system in 11g.

* In 10g OCR least size is 100MB.

* In Oracle 10.2 and above, the OCR can be mirrored, eliminating the potential for it to become a single point of failure. A maximum of two copies can be maintained by Oracle Clusterware.

* Oracle automatically backed up every 4-hour, last 3-backup copies are always retained in 10g.

Voting Disk:

The voting disk is nothing but a file that contains and manages information of all the node memberships.

The VD is a communication mechanism where every node reads and writes its heartbeat information. The VD is also used to kill the node(s) when the network communication is lost between one or several nodes in the cluster to prevent a split-brain and protect the database information.

Shared storage is also required for a voting (or quorum) disk, which is used to determine the nodes that are currently available within the cluster. The voting disk is used by the OCSSD todetect when nodes join and leave the cluster and is therefore also known as the Cluster Synchronization Services (CSS) voting disk

Voting disk acts a tiebreaker during communication failure, it consists of Heart Beat information from all nodes is sent to voting disk when the cluster is running. It also maintains the Node membership information.

The Node Monitor (NM) uses the Voting Disk for the Disk Heartbeat, which is essential in the detection and resolution of cluster "split brain".

The voting disk has similar storage characteristics to the OCR. It can be stored on a raw device or on a cluster file system. It cannot be stored on an ASM file system.
In Oracle 10.2 and above, the voting disk can be mirrored, eliminating the potential for it to become a single point of failure. By default, three copies of the voting disk will be created. Oracle recommends that an odd number of voting disk copies is maintained.

RAC Database Architecture:

See the below Images for Oracle10g RAC Architecture.

Image2:

Oracle RAC 10g is a shared disk subsystem, all nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database, each node has its own redo log file(s) and UNDO table space, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.
The difference between Oracle RAC and OPS is the addition of Cache Fusion ,With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data.
With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.
With Oracle RAC 10g, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, SAN, NAS, and ASM, or on a clustered file system.

Database Structure:

The physical database consist of Control file, Data files, Redo Log files, Archive log files and Parameter file in Single database instance, In RAC also same files, but as a minimum, the control files, datafiles, online redo logs, and server parameter file must reside on shared storage.

There are two RAC-specific files, the OCR and the voting disk, which must also be located on shared storage.

The remaining files may be located on local disks for each node.

However, it is advisable, if not mandatory, to locate any archive log directories on shared storage.

See the below Image for Database Files in a RAC Configuration:

Datafiles:

In a RAC database there is only one copy of each datafile, which is located on shared storage and can be accessed by all instances.

Control files:

In both single-instance and RAC environments it is recommended that multiple copies of the

Control file are created and maintained. Oracle automatically synchronizes the contents of all files specified by the CONTROL_FILES parameter. Each copy should be identical and can be updated by any instance. The control file copies should all be located on shared storage.

Redo logfiles:

In RAC, each instance writes to its own set of online redo log groups. The redo managed by an individual instance is called a redo log thread, and each thread must contain at least two redo log groups. Each online redo log group is associated with a particular thread number. While there is a one-to-one mapping between the instance and the redo log thread (i.e., each instance maintains its own redo log thread), it is not necessarily the case that the instance number is the same as the thread number. When the online redo log file is archived, the thread number should be included in the name of the archive log file. This number is also used if it becomes necessary to recover the thread from the archived redo logs.

Each redo log group may contain one or more identical redo log files or members. If more than one member is configured, Oracle will software mirror, or multiplex the redo log file to each separate file. This eliminates a potential single point of failure in environments where hardware mirroring is either unavailable or undesirable.

For example, in a two-node RAC configuration, the following query illustrates the usage of the thread and group numbers:

SQL> SELECT LG.INST_ID, LG.GROUP#, LG.THREAD#, LF.MEMBER

2 FROM GV$LOG LG, GV$LOGFILE LF

3 WHERE LG.INST_ID= LF.INST_ID AND LG.GROUP# = LF.GROUP#

4 ORDERS BY INST_ID, GROUP#, THREAD#;

Undo Tablespace:

Automatic Undo Management (AUM) was introduced in Oracle 9.0.1 and is recommended for RAC databases.

In a RAC environment, each instance participating in the cluster will have its own copy of an undo tablespace. During instance startup, an instance binds an undo tablespace to itself. At instance startup, each undo tablespace will contain 10 undo segments. The number of additional segments brought online during instance startup is based on the SESSIONS parameter. Oracle allocates approximately one undo segment for every transaction.

These are sized according to the autoallocate algorithm for locally managed tablespaces.

If AUM is implemented, then one undo tablespace is required for each instance.

If you decide to implement manual undo management using rollback segments, and then a single rollback segment tablespace can be used for the entire database.

Archive Redo logs:

As redo log files store information pertaining to a specific instance, archive files, which are copies of redo log files; also contain information pertaining to that specific instance. As in the case of redo log files, write activities to the archived redo log happen from only one instance. Archive log files can be on shared or local storage. However, for easy recovery and backup operations, these files should be visible from all instances.

In a RAC environment, each instance maintains its own set of archived redo logs. These may either be located on local file systems on each node or in a shared file system that can be accessed by all nodes. While space considerations may dictate that archived redo logs be stored on local file systems, we recommend that you attempt to locate these files on a shared file system. That is because it may be necessary to restore these files and a remote node may be performing the restore.

Other files

Files that contain instance-specific information, such as the alert logs or trace files generated by the various background and foreground processes, are maintained at the instance level.

Instances:

A RAC database normally consists of two or more instances. Each instance generally resides on a different node and consists of a superset of the shared memory structures and background processes used to support single-instance databases.

Each instance has an area of shared memory called the System Global Area (SGA). All processes attaching to the instance can access data in the SGA. Oracle prevents multiple processes from updating the same area of memory by using mechanisms such as latches and locks. A latch is a lightweight mechanism that is used for very fast accesses. Processes do not queue for latches; if a latch is not immediately available a process will spin (loop repeatedly) for a limited period and then sleep, waking up at regular intervals to check if the latch has become available. On the other hand, locks are obtained for longer periods. Access to locks is managed by structures called enqueues, which maintain queues of processes requiring access to the locks.

The largest area of memory within the SGA on each instance is usually the buffer cache using cache fusion. Oracle RAC environment logically combine each instance’s buffer cache to enable to instances to process data as if the data resided on logically combined single cache. The set of buffers that is used to hold blocks that have been read from the database, when blocks are modified, the changes are immediately written to the redo buffer and are flushed to the redo log when the transaction commits. The changes are applied to the block in the buffer cache where it remains until it is subsequently written back to disk by the database writer process.

In order to maintain consistency between the instances in the database, RAC maintains a virtual structure called the Global Resource Directory (GRD), which is distributed across the SGAs of all active instances and contains information about all blocks and locks held by instances and processes across the cluster. The information in the GRD is maintained by two internal services known as the Global Cache Service (GCS) and the Global Enqueue Service (GES). These services are implemented using background processes on each instance and communicate with the equivalent processes on the other instances across theinterconnect.

The GCS and GES maintain records of the status of each datafile and each cached blocks using a GSD.

After one Instance cache data, any other instance within the same cluster database can acquire a block image from another instance in the same database faster than reading the blocks from disk.

BACKGROUND PROCESS (RAC vs Single Instance):

In RAC also same like as single instance back ground process and some more additional back ground process available.

See the below Image for Background process for single Instance:

In simple way a Single Instance Background Processes + some additional RAC specific Background Processes,

See the below Image for Background process for RAC:

Mandatory Background Processes for Single Instance as well as RAC

Many of these background processes are similar to those found in an RDBMS RAC instance:

• SMON

• PMON

• LGWR

• DBWn

• CKPT

• MMAN

SMON:

One system monitor background process(SMON) per Instance, It is responsible for performing recovery at instance start-up and cleaning up temporary segments that are no longer in use.

It also coalesces contiguous free extents in dictionary-managed tablespaces.

In a RAC environment, the SMON process of one instance can perform instance recovery in the event of the failure of another server or instance.

PMON:

There is one process monitor (PMON) background process per instance. In the event of a user process failing, PMON is responsible for cleaning up the memory structures such as locks on resources and for

freeing any database buffers that the user was holding.

The PMON background process is also responsible for registering information about the instance and dispatcher processes with the listener process.

In a RAC environment, this information can optionally be used for connection balancing.

LGWR:

There is one LGWR process per instance. Every change made to the database must be rewritten to the redo log before it can be written back to the database.

In a RAC environment, there is one LGWR process per instance and this process can only write to

the current online redo log.

DBWn:

The database writer (DBWn) process writes dirty blocks from the buffer cache back to the data files.

Dirty blocks, which are blocks that have been updated by one of the foreground processes, cannot be written to disk until the changes have been written to the current online redo log. However, once the changes have been written to the redo log, it is not necessary for the database writer processes to write blocks back to disk immediately. Therefore, the blocks are written back asynchronously either when additional space is required in the buffer cache or in a RAC instance when a write is requested by the block master.

In Oracle 9.0.1 there can be up to 10 DBWn processes (DBW0 to DBW9). In Oracle 9.2 and above there can be up to 20 DBWn processes (DBW0 to DBW9, DBWa to DBWj).

CKPT:

Performing a checkpoint involves updating the headers of each of the database files. This was originally done

by the DBWn process, but for databases with a number of datafiles, this could lead to significant pauses in processing while the headers were updated.

The purpose of the CKPT process is to update the database file headers; dirty buffers are still written back to disk by the DBWn background processes. There is one checkpoint background process per instance.

MMAN:

The MMAN background process is responsible for the Automatic Shared Memory Management feature. There is one MMAN process per instance.

MMON:

In Oracle 10.1 and above, the MMON background process performs manageability-related tasks, including capturing statistics values for SQL objects that have been recently modified and issuing alerts when metrics exceed their threshold values. MMON also takes database snapshots by spawning additional slave processes.

MMNL:

In Oracle 10.1 and above, the MMNL background process performs manageability-related tasks such

as session history capture and metrics computation.

ARCn:

The Oracle database can optionally be configured to ARCn background processes to copy completed

online redo log files to a separate area of disk, following each log switch.

In Oracle 9.0.1 and above, there can be up to 10 ARCn background processes numbered ARC0 to ARC9. In Oracle 10.2 and above, there can be up to 30 archiver processes numbered ARC0 to ARC9, ARCa to ARCt.

In a RAC environment we recommend that archived log files be written to shared storage, and in Oracle 10.1 and above, to the Flash Recovery Area.

RECO:

The RECO background process was introduced in Oracle 7.0 and is used in distributed database configurations to automatically resolve failures in distributed transactions. The RECO background process connects to other databases involved in an in-doubt distributed transaction.

Distributed database configurations enable transactions to be performed on multiple databases.

This is not the same as RAC, where transactions are performed by different instances against the same database.

ASMB:

The ASMB background process runs in the RDBMS instance and connects to the foreground processes in the ASM instance.
RBAL:

The RBAL background process in the RDBMS instance performs global opens to the disks in the disk groups in the database instance. Note that there is also an RBAL background process in the ASM instance that has a completely different purpose.

RAC-Specific Background Processes

There are a number of RAC-specific background processes. These background processes implement the Global Cache and Global Enqueue services and manage the diagnostic information related to node failures.

1. Lock Monitor Processes (LMON)
2. Lock Monitor Services (LMS)
3. Lock Monitor Daemon Process (LMD)
4. LCKn (Lock Process)
5. DIAG (Diagnostic Daemon)

LMON:

Lock Monitor Processes also known as a Global enqueue service monitor.

It Maintains GCS memory structures.
It handles the abnormal termination of processes and instances.

LMON is similar to PMON in that it also manages instance and process expirations and performs recovery processing on global enqueues.

The Global Enqueue Service Monitor (LMON) background process is responsible for managing global enqueues and cluster resources.
Reconfiguration of locks & resources when an instance joins or leaves the cluster are handled by LMON (During reconfiguration LMON generate the trace files)
It responsible for executing dynamic lock remastering every 10 mins (Only in 10g R2 & later versions).
LMON Processes manages the global locks & resources.
It monitors all instances in cluster, primary for dictionary cache locks, library cache locks & deadlocks on deadlock sensitive on enqueue & resources.
LMON also provides cluster group services.

In Oracle 10.1 and below there is only one lock monitor background process.
In brief to understand LMON, In a single-instance database, access to database resources is controlled using enqueues that ensure that only one session has access to a resource at a time and that other sessions wait on a first in, first out (FIFO) queue until the resource becomes free. In a single-instance database, all locks are local to the instance. In a RAC database there are global resources, including locks and enqueues that need to be visible to all instances. For example, the database mount lock that is used to control which instances can concurrently mount the database is a global enqueue, as are library cache locks, which are used to signal changes in object definitions that might invalidate objects currently in the library cache.

LMS:

Lock Monitor Service also called the GCS (Global Cache Services) processes.

LMS is most very active background processes.
Its primary job is to transport blocks across the nodes for cache-fusion requests.
If there is a consistent-read request, the LMS process rolls back the block, make a Consistent-Read image of the block and then ship this block across the HSI (High Speed Interconnect) to the process requesting from a remote node.
LMS must also check constantly with the LMD background process (or our GES process) to get the lock requests placed by the LMD process.The Global Cache Service background processes (LMSn) manage requests for data access between the nodes of the cluster. Each block is assigned to a specific instance using the same hash algorithm that is used for global resources. The instance managing the block is known as the resource master.

When an instance requires access to a specific block, a request is sent to an LMS process on the resource

master requesting access to the block. The LMS process can build a read-consistent image of the block and return it to the requesting instance, or it can forward the request to the instance currently holding the block.

The LMS processes coordinate block updates, allowing only one instance at a time to make changes to a block and ensuring that those changes are made to the most recent version of the block. The LMS process on the resource master is responsible for maintaining a record of the current status of the block, including whether it has been updated.

In Oracle 9.0.1 and Oracle 9.2 there can be up to 10 LMSn background processes (LMS0 to LMS9) per nstance; in Oracle 10.1 there can be up to 20 LMSn background processes (LMS0 to LMS9, LMSa to LMSj) per instance; in Oracle 10.2 there can be up to 36 LMSnbackground processes (LMS0 to LMS9, LMSa to LMSz). The number of required LMSnprocesses varies depending on the amount of messaging Between the nodes in the cluster.

GCS_SERVER_PROCESSES --> no of LMS processes specified in init. ora parameter.
Above parameter value set based on number of cpu's (MIN (CPU_COUNT/2, 2)) 10gR2, single CPU instance,only one LMS processes started.
Increasing the parameter value,if global cache activity is very high.

Internal View: X$KJMSDP

LMDn:

Lock Monitor Daemon Process also known as the Global Enque Service Daemon (gesd); it manages global enque and global resource access from within each instance.

It manages incoming remote resource requests (i.e, requests for locks that come from other instances in cluster)

LMD process performs global lock deadlock detection.
Also monitors for lock conversion timeouts.
LMD process also handles deadlock detection and remote enqueue requests.
Remote resource requests are the requests originating from another instance.
Internal View: X$KJMDDPthe LMD background process is responsible for managing requests for global enqueues and updating the status of the enqueues as requests are granted. Each global resource is assigned to a specific instance using a hash algorithm. When an instance requests a lock, the LMD process of the local instance sends a request to the LMD process of the remote instance managing the resource. If the resource is available, then the remote LMD process updates the enqueue status and notifies the local LMD process. If the enqueue is currently in use by another instance, the remote LMD process will queue the request until the resource becomes available. It will then update the enqueue status and inform the local LMD process that the lock is available.

The LMD processes also detect and resolve deadlocks that may occur if two or more instances attempt to access the two or more enqueues concurrently.

In Oracle 10.1 and below there is only one lock monitor daemon background process named LMD0.

LCKn:

Lock Process (LCKn) manages request that are not cache fusion requests, such as row cache requests and library cache requests.

The LCK maintains a list of lock elements and uses the list to validate locks during instance recovery.
Manages instance resource requests & cross instance calls for shared resources.

The instance enqueue background process (LCK0) is part of GES. It manages requests for resources other than data blocks—for example, library and row cache objects. LCK processes handle all resource transfers not requiring Cache Fusion. It also handles cross-instance call operations.

In Oracle 9.0.1 there could be up to ten LCK processes (LCK0 to LCK9).

In Oracle 9.2 and Oracle 10.1 and 10.2 there is only one LCK process (LCK0).

DIAG:
Oracle 10g - this one new background processes (New enhanced diagnosability framework).
The Diagnostic Daemon is responsible for capturing information on process failure in RACE environment, and writing out trace information for failure analysis.

Regularly monitors the health of the instance.
Also checks instance hangs & deadlocks.
It captures the vital diagnostics data for instance & process failures.

The current status of each global enqueue is maintained in a memory structure in the SGA of one of the instances. For each global resource, three lists of locks are held, indicating which instances are granted, converting, and waiting for the lock.

The DIAG background process captures diagnostic information when either a process or the entire instance fails. This information is written to a subdirectory within the directory specified by the BACKGROUND_DUMP_DEST initialization parameter. The files generated by this process can be forwarded to Oracle Support for further analysis.

There is one DIAG background process per instance. It should not be disabled or removed.

In the event that the DIAG background process itself fails, it can be automatically restarted by other background processes.

Data Guard–Specific Background Processes

There are a number of additional background processes supporting Data Guard operations:

• DMON: Data Guard Broker monitors process

• INSV: Data Guard Broker instance slave process

• NSV0: Data Guard Broker NetSlave process

• DSMn: Data Guard Broker Resource Guard process

• MRP0: Managed Standby Recovery process

• LSP0: Logical standby

• LSP1: Dictionary builds process for logical standby

• LSP2: Set Guard Standby Information for logical standby

• LNSn: Network server

ASM-Specific Background Processes

Finally, each ASM instance has a handful of ASM-specific background processes:

RBAL: The ASM rebalance master process controls all ASM rebalancing operations, allocating work to the ASM rebalance slaves, which actually perform the rebalancing.

ARBn: The ASM rebalance slaves perform rebalancing of data extents across the ASM file systems.

In Oracle 10.2 there are up to ten rebalance slaves named ARB0 to ARB9.

GMON: The disk group monitor process, which was introduced in Oracle 10.2, monitors the ASM disk groups.

PSP0: The process spawner background process is responsible for starting and stopping ASM rebalances slaves. This reduces the workload of the RBAL background process.

RAC cache fusion:

Cache fusion essentially enables the shipping of blocks between the SGA’s of nodes in a cluster, via interconnect. This concept avoids having to push the block down to disk, to reread into the buffer cache of another Instance. When the block is read into buffer cache of an instance in RAC, a lock resource is assigned to that block (different from a row level lock) in order to ensure that other instances are aware that block is in Use. Then ,if another instance requests a copy of that same block which is already in buffer cache of the first instance, that block can be transfer across the inter connect directly to SGA of the other Instance.

If the block in memory has been changed, but the change has not been committed, a CR(consistency read) copy is shipped instance, this mean whenever possible data blocks move between each instance’s buffer caches without needing to be written to disk.

Dynamic resource mastering: In addition, a new feature in 10g is the concept of dynamic resource remastering. Locks are resources that are held in SGA of each instance, and they used to control access to database blocks. Each instance generally holds (or master) a certain amount of locks which are associated with a range of blocks, when a instance request a block, a lock must be obtained for that block and it must be obtained from instance that is currently mastering those locks.

The concept of dynamic remastering is which essentially means that if certain instance is requesting locks for certain blocks more often than any of other instances, that lock will eventually be moved into SGA of the requesting instance making future lock requests more efficient.

Reconfiguration: In case of node death the process of remastering that node’s locks across the remaining instances is referred to as configuration. When a node of a instance dies or it taken offline, the locks (resources) that were previously mastered in that instance SGA are distributed among the remaining Instance. In case of an instance rejoining the cluster, a reconfiguration will again take place, and the new instance will end up mastering a portion of a lock previously held by the other instance in cluster. This knows as reconfiguration.

In RAC Database there is remote cache so instance should look not only in local cache (cache local to instance) but on remote cache (cache on remote instance). If cache is available in local cache then it should return data block from local cache; if data block is not in local cache, instead of going to disk it should first go to remote cache (remote instance) to check if block is available in local cache (via interconnect).

This is because accessing data block from remote cache is faster than accessing it from disk.

The cache fusion depends on following things GSD (GCS, GES)

Global Cache Service (GCS):

In a RAC database each instance has its own database buffer cache, which is located in the SGA on the local node. However, all instances share the same set of data files. It is therefore possible that one or more instances might attempt to read and/or update the same block at the same time. So access to the data blocks across the cluster must be managed in order to guarantee only one instance can modify the block at a time. In addition, any changes must be made visible to all other instances immediately once the transaction is committed. This is managed by the GCS, which coordinates requests for data access between the instances of the cluster.

Global Enqueue Service (GES):

In a RAC database, the GES is responsible for interinstance resource coordination. The GES manages all non-Cache Fusion intra-instance resource operations. It tracks the status of all Oracle enqueue mechanisms for resources that are accessed by more than one instance. Oracle uses GES to manage concurrency for resources operating on transactions, tables, and other structures within a RAC environment.

NOTE: For more about GRD (GCS, GES), please read the Cache fusion concept.

Ranjit's Oracle Blogs

Pages

Tuesday, 10 September 2013

Oracle 10g Real Application Cluster Architecture

No comments:

Search This Blog

Followers

Popular Posts

Links really helpful

Translate

Contact Form