Tuesday, July 7, 2009

The Voting disk is one of the required parts of the Oracle Cluster environment.
Another name for the Voting disk is quorum disk.

The voting disk is used to determine quorum in case of failure and provides a second heartbeat mechanism to validate cluster health. The Disk heartbeat is maintained in the voting disk.

The voting disk is used to ascertain cluster state.

If a node eviction needs to take place the voting disk is updated with the "eviction message".

If we look at the structure of the voting disk, each node of the cluster has his own part in the voting disk. When a new node is added it will add the information to a new part of the voting disk.

From Oracle 10g release 2 you can define multiple voting disks. Where in Oracle 10g release 1, you need to make sure the voting disk is mirrored to make sure the voting disk is not the single point of failure.

How to add, remove or backup the voting disk see:
http://download-east.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm

Cluster Registry is one of the required components in an Oracle Cluster environment.

It is a registry which contains all the information about the cluster environment. You need to think of node names, ip addresses, an application resources like listener,vip, gsd but also the databases/instance. Also the parameters like need to startup, dependencies are stored in the OCR.


The OCR is created during the CRS installation when the root.sh script is executed. When root.sh is executed it will read the ocr.loc file which is create during installation and pointing the the OCR file /device.

To make sure all the nodes in the cluster can read the OCR the ocr location must be on shared storage.

The location of the ocr.loc depended on the platform used.
Linux: /etc/oracle/ocr.loc
Aix: /etc/oracle/ocr.loc
Solaris : /var/opt/oracle
Windows : HKEY_LOCAL_MACHINE\SOFTWARE\Oracle\OCR

If we look in the ocr.loc file we see the following.

bash-3.00$ cat /etc/oracle/ocr.loc ocrconfig_loc=/oracrs/oradata/data01/ocrdisk1

ocrmirrorconfig_loc=/oracrs/oradata/data02/ocrdisk2
local_only=FALSE

The value of local_only = true indicates "Single instance only" and false means using "RAC"


The orc.loc is the location where the CRS stack will check for the OCR during startup. When the OCR is found it will be read for the voting disk location and the other information. If for some reason the orc.loc or the location in the ocr.loc is not available the cluster will not be started.

From oracle 10g release 2, it is possible to define more OCR locations (mirroring).
Clients of the OCR are srvctl, css, crs, dbua, vipca and em
.

Tools which can be used:
ocrconfig - configuration tool for Oracle Cluster Registry
ocrdump – utility to dump the contents of the OCR in a file.
ocrcheck – utility to verify the OCR integrity.

Ocrconfig:
http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/ocrsyntax.htm#RACAD835

ocrdump:
http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/appsupport.htm#sthref1123

ocrcheck:
http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/appsupport.htm#BEHJIJIB

VIPCA
=========
Oracle Clusterware requires a virtual IP address for each node in the cluster.

This IP address must be on the same subnet as the public IP address for the node and should be an address that is assigned a name in the Domain Name Service, but is unused and cannot be pinged in the network before installation of Oracle Clusterware.

The VIP is a node application (nodeapp) defined in the OCR that is managed by Oracle Clusterware. The VIP is configured with the VIPCA utility. The root script calls the VIPCA utility in silent mode.


Why Oracle 10g has a VIP?

Protects database clients from long TCP IP timeouts (>10 minutes).

During normal operation works the same as hostname.

During failure it removes network timeout from connection request time ,client fails immediately to the next address in the list.

Oracle RAC Details

=================

About Virtual IP
Why is there a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?
It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.

The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately. This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.
Going one step further is making use of Transparent Application Failover (TAF). With TAF successfully configured, it is possible to completely avoid ORA-3113 errors alltogether! TAF will be discussed in more detail in Section 28 ("Transparent Application Failover - (TAF)").
Without using VIPs, clients connected to a node that died will often wait a 10-minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs (Source - Metalink Note 220970.1).