Installing Oracle 9i Real Application Cluster (RAC) on Red Hat Linux Advanced Server 2.1

www.puschitz.com



The following procedure is a step-by-step guide (Cookbook) with tips and information for installing Oracle 9i Real Application Cluster (RAC) on Red Hat Linux Advanced Server 2.1. It shows how I installed Oracle 9i Real Application Cluster on three RH AS 2.1 servers. Among other things, this article covers Raw Devices, Oracle Cluster File System (OCFS), and FireWire-Based Shared Storage. Note that OCFS is not required for 9i RAC. In fact, I never use OCFS for RAC systems. However, this article covers OCFS since some people want to know how to configure and use OCFS.

The primary goal of this article is to show how to quickly install Oracle9i RAC on RH AS 2.1. I welcome emails from any readers with comments, suggestions, or corrections. You can find my email address at the bottom of this website.

If you have never installed Oracle9i on Linux before, then I recommend that you first try to install an Oracle9i database on Linux by following the steps in my other article Installing Oracle9i on RedHat Linux.

This article covers the following subjects and steps:

Introduction

* General
* Important Notes
* Oracle9i RAC Setup
* Shared Disks Storage
    General
    FireWire-Based Shared Storage for Linux

Pre-Installation Steps for All Clustered RAC Nodes

* Installing Red Hat Advanced Server
    Installing Software Packages (RPMs)
* Upgrading the Linux Kernel
    General
    Upgrading the Linux Kernel for FireWire Shared Disks Only
* Configuring Public and Private Network
* Configuring Shared Storage Devices
    General
    Configuring FireWire-Based Shared Storage
    Automatic Scanning of FireWire-Based Shared Storage
* Creating Oracle User Accounts
* Setting Oracle Environments
* Sizing Oracle Disk Space for Database Software
* Creating Oracle Directories
* Creating Partitions on Shared Storage Devices
    General
    Creating Partitions for OCFS
    Creating Partitions for Raw Devices
* Installing and Configuring Oracle Cluster File Systems (OCFS)
    Installing OCFS
    Installing OCFS for FireWire Shared Disks Only
    Configuring OCFS
    Additional Changes for Configuring OCFS for FireWire Drive
    Creating OCFS File Systems
    Mounting OCFS File Systems
    Configuring the OCFS File Systems to Mount Automatically at Startup
    Additional Changes for FireWire Storage to mount OCFS File Systems Automatically at Startup
* Installing the "hangcheck-timer" Kernel Module
    The hangcheck-timer.o Module
    Installing the hangcheck-timer.o Module
    Configuring and Loading the hangcheck-timer Module
* Setting up RAC Nodes for Remote Access
* Checking Packages (RPMs)
* Sizing Swap Space
* Adjusting Network Settings
* Setting Shared Memory
* Checking /tmp Space
* Setting Semaphores
* Setting File Handles

Setting Up Oracle 9i Cluster Manager

* General
    Checking OCFS and rsh
    Downloading and Extracting Oracle Patch Set
* Installing Oracle 9i Cluster Manager
    Creating the Cluster Manager Quorum File
    Installing Oracle9i Cluster Manager 9.2.0.1.0
    Applying Oracle9i Cluster Manager 9.2.0.4.0 Patch Set
* Configuring Oracle 9i Cluster Manager
* Starting and Stopping Oracle 9i Cluster Manager

Installing Oracle 9i Real Application Cluster

* General
* Installing Oracle 9i Database Software with Real Application Cluster
    Creating the Shared Configuration File for srvctl
    Installing Oracle9i 9.2.0.1.0 Database Software with Oracle9i Real Application Cluster
    Applying Oracle9i 9.2.0.4.0 Patch Set

Creating an Oracle 9i Real Application Cluster (RAC) Database

* Starting Oracle Global Services
* Creating the Database
    Using OCFS for Database Files and Other Files
    Using Raw Devices for Database Files and Other Files
    Running Oracle Database Configuration Assistant

Transparent Application Failover (TAF)

* Introduction
* Testing Transparent Application Failover (TAF) on the New Installed RAC Cluster
    Setup
    Example of a Transparent Application Failover

Appendix

* Oracle 9i RAC Problems and Errors
* References

Introduction

General

Oracle Real Application Cluster (RAC) is a cluster system at the application level. It uses shared disk architecture that provides scalability for all kind of applications. Applications without any modifications can use the RAC database.

Since the requests in a RAC cluster are spread evenly across the RAC instances, and since all instances access the same shared storage, addition of server(s) require no architecture changes etc. And a failure of a single RAC node results only in the loss of scalability and not in the loss of data since a single database image is utilized.

Important Notes

There are few important notes that might be useful to know before installing Oracle9i RAC:

(*) If you want to install Oracle9i RAC using FireWire drive(s), make sure to read first
FireWire-Based Shared Storage for Linux!

(*) At the time of this writing, there is a bug that prevents you from successfully installing and running Oracle9i RAC using OCFS on FireWire drives on RH AS 2.1. Note that FireWire-based shared storage for Oracle9i RAC is experimental! See Setting Up Linux with FireWire-Based Shared Storage for Oracle9i RAC for more information. At the time of this writing, the only option is to use raw devices for all partitions on the FireWire drives. However, you might be lucky and get RAC working using OCFS on FireWire drives in RH AS 2.1. And this article will show you how to do it in case you are one of the lucky ones :)

(*) If you want to setup a FireWire shared drive using raw devices for all Oracle files, then keep in mind that Linux uses the SCSI layer for FireWire drives. This means that you can only create 14 raw devices on a single FireWire drive. Since /dev/sda16 is really /dev/sdb, you have 15 partitions minus 1 partition for the extended partition. Therefore you can create only 14 raw devices on a single FireWire disk. This means that you might have to buy a second FireWire drive.

(*) See also Oracle 9i RAC Problems and Errors

Oracle9i RAC Setup

For this article I used the following Oracle setup:
RAC node          Database Name   Oracle SID  $ORACLE_BASE   Oracle Datafile Directory
---------------   -------------   ----------  ------------   ----------------------------
rac1pub/rac1prv   orcl            orcl1       /opt/oracle/   /var/opt/oracle/oradata/orcl
rac2pub/rac2prv   orcl            orcl2       /opt/oracle/   /var/opt/oracle/oradata/orcl
rac3pub/rac3prv   orcl            orcl3       /opt/oracle/   /var/opt/oracle/oradata/orcl

Shared Disks Storage

General

A requirement for Oracle9i RAC cluster is a set of servers with shared disk access and interconnect connectivity. Since each instance in a RAC system must have access to the same database files, a shared storage is required that can be accessed from all RAC nodes concurrently.

The shared storage space can be used as raw devices, or by using a cluster file system. This article will address raw devices and Oracle's Cluster File System OCFS. Note that Oracle9i RAC provides it's own locking mechanisms and therefore it does not rely on other cluster software or on the operating system for handling locks.

FireWire-Based Shared Storage for Linux

Shared Storage can become expensive. If you just want to check out the advanced features of Oracle9i RAC without spending too much money, I'd recommend to buy an external FireWire-Based Shared Storage for Oracle9i RAC. Caveat: You can download a patch from Oracle for FireWire-Based Shared Storage for Oracle9i RAC, but Oracle does not support the patch. It is intended for testing and demonstration only! See
Setting Up Linux with FireWire-Based Shared Storage for Oracle9i RAC for more information.

Note that it is very important to get a FireWire-Based Shared Storage that allows concurrent access for more than one server. Otherwise the disk(s) and partitions can only be seen by one server at a time. Therefore, make sure the FireWire drive(s) have a chipset that supports concurrent access for at least two servers or more. If you have already a FireWire drive, you can check the maximum supported logins (concurrent access) by following the steps as outlined under Configuring FireWire-Based Shared Storage.

For test purposes I used an external 250 GB and 200 GB Maxtor hard drive which support a maximum of 3 concurrent logins. Here are the technical specifications for these FireWire drives:
- Vendor: Maxtor
- Model: OneTouch
- Mfg. Part No. or KIT No.: A01A200 or A01A250 
- Capacity: 200 GB or 250 GB
- Cache Buffer: 8 MB
- Spin Rate: 7200 RPM
- "Combo" Interface: IEEE 1394 and SPB-2 compliant (100 to 400 Mbits/sec) plus USB 2.0 and USB 1.1 compatible
Here are links where these drives can be bought:
Maxtor 200GB One Touch Personal Storage External USB 2.0/FireWire Hard Drive
Maxtor 250GB One Touch Personal Storage External USB 2.0/FireWire Hard Drive

The FireWire adapter I'm using is a StarTech 4 Port IEEE-1394 PCI Firewire Card. Don't forget that you will also need a FireWire hub if you want to connect more than 2 RAC nodes to the FireWire drive(s).


Pre-Installation Steps for All Clustered RAC Nodes


The following steps need to be performed on all nodes of the RAC cluster unless it says otherwise!

Installing Red Hat Advanced Server

You can find the installation guide for installing Red Hat Linux Advanced Server under Red Hat Enterprise Linux Manuals.

You cannot download Red Hat Linux Advanced Server, you can only download the source code. If you want to get the binary CDs, you have to buy it at http://www.redhat.com/software/rhel/.

Installing Software Packages (RPMs)

You don't have to install all RPMs when you want to run an Oracle9i RAC database on Red Hat Advanced Server. You are fine when you select the Installation Type "Advanced Server" and when you don't select the Package Group "Software Development". There are only a few other RPMs that are required for installing Oracle9i RAC. These other RPMs are covered in this article.

Make sure that no firewall is selected during the installation.

Upgrading the Linux Kernel


General

Using the right Red Hat Enterprise Linux kernel is very important for an Oracle database. Beside important fixes and improvements, the
hangcheck-timer.o module comes now with newer Red Hat Enterprise Linux kernel versions which is a requirement for a RAC system. Therefore it is important to follow the steps as outlined under Upgrading the Linux Kernel unless you are using FireWire shared drives (see below).


Upgrading the Linux Kernel for FireWire Shared Disks ONLY

You can download a patch from Oracle for FireWire-Based Shared Storage for Oracle9i RAC, but Oracle does not support the patch. It is intended for testing and demonstration only! See
Setting Up Linux with FireWire-Based Shared Storage for Oracle9i RAC for more information.

Download the experimental kernel for FireWire shared drives from http://oss.oracle.com/projects/firewire/files/old.

There are two experimental kernels for FireWire shared drives, one for UP machines and one for SMP machines. To install the kernel for a single CPU machine, run the following command:
su - root
rpm -ivh --nodeps kernel-2.4.20-18.10.1.i686.rpm
Note that the above command does not upgrade your existing kernel. This is my preferred method since I always want to have the option to go back to the old kernel in case the new kernel causes problems or doesn't come up.

To make sure that the right kernel is booted, check the /etc/grub.conf file if you use GRUB, and change the "default" attribute if necessary. Here is an example:
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Red Hat Linux (2.4.20-18.10.1)
        root (hd0,0)
        kernel /vmlinuz-2.4.20-18.10.1 ro root=/dev/hda1 hdc=ide-scsi
        initrd /initrd-2.4.20-18.10.1.img
title Red Hat Linux Advanced Server-up (2.4.9-e.25)
        root (hd0,0)
        kernel /boot/vmlinuz-2.4.9-e.25 ro root=/dev/hda1 hdc=ide-scsi
        initrd /boot/initrd-2.4.9-e.25.img

In this example, the "default" attribute is set to "0" which means that the the experimental FireWire kernel 2.4.20-18.10.1 will be booted. If the "default" attribute would be set to "1", the 2.4.9-e.25 kernel would be booted.

After you installed the new kernel, reboot the server:
su - root
reboot
Once you are sure that you don't need the old kernel anymore, you can remove the old kernel by running:
su - root
rpm -e <OldKernelVersion>
When you remove the old kernel, you don't need to make any changes to the /etc/grub.conf file.

Configuring Public and Private Network

Each RAC node should have at least one static IP address for the public network and one static IP address for the private cluster interconnect.

The private networks are critical components of a RAC cluster. The private networks should only be used by Oracle to carry Cluster Manager and Cache Fusion inter-node connection. A RAC database does not require a separate private network, but using a public network can degrade database performance (high latency, low bandwidth). Therefore the private network should have high-speed NICs (preferably one gigabit or more) and it should only be used by Oracle9i RAC and Cluster Manager.

It is recommended that private network addresses are managed using the /etc/hosts file. This avoids the problem of making DNS, NIS, etc. a single point of failure for the database cluster.

Here is an example how the /etc/hosts could look like:
# Public hostnames - public network

192.168.10.1   rac1pub.puschitz.com rac1pub      # RAC node 1
192.168.10.2   rac2pub.puschitz.com rac2pub      # RAC node 2
192.168.10.3   rac3pub.puschitz.com rac3pub      # RAC node 3

# Private hostnames, private network - interconnect

192.168.1.1    rac1prv.puschitz.com rac1prv      # RAC node 1
192.168.1.2    rac2prv.puschitz.com rac2prv      # RAC node 2
192.168.1.3    rac3prv.puschitz.com rac3prv      # RAC node 3

If you are trying to check out the advanced features of Oracle9i RAC on a cheap system where you don't have two Ethernet adapters, you could assign both server names (public and private) to the same IP adddress. For example:
192.168.1.1    rac1prv   rac1pub     # RAC node 1 - for server with single network adapter
192.168.1.2    rac2prv   rac2pub     # RAC node 2 - for server with single network adapter
192.168.1.3    rac3prv   rac3pub     # RAC node 3 - for server with single network adapter

To configure the network interfaces, run the following command on each node:
su - root
redhat-config-network

Configuring Shared Storage Devices

General

For instructions on how to setup a shared storage device on Red Hat Advanced Server, see the installation instructions of the manufacturer.

Configuring FireWire-Based Shared Storage

First make sure the experimental kernel for FireWire was installed and the server has been rebooted (see
Upgrading the Linux Kernel for FireWire Shared Disks Only):
# uname -r
2.4.20-18.10.1

To load the kernel modules/drivers with the right options etc., add the following entries to the /etc/modules.conf file:
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-remove sbp2 rmmod sd_mod
It is important that the parameter sbp2_exclusive_login of the Serial Bus Protocol module sbp2 is set to zero to allow multiple hosts to log into or to access the FireWire storage at the same time. The second line makes sure the SCSI disk driver module sd_mod is loaded as well since sbp2 needs the SCSI layer. The SCSI core support module scsi_mod will be loaded automatically if sd_mod is loaded - there is no need to make an entry for it.

Now try to load the firewire stack:
su - root
modprobe ohci1394  # Load OHCI 1394 kernel module (my FireWire drive is OHCI 1394 compliant)
modprobe sbp2      # Load Serial Bus Protocol 2 kernel module
#
If everything worked fine, the following modules should now be loaded:
su - root
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sd_mod                 13564   0  (unused)
sbp2                   20000   0  (unused)
scsi_mod              119768   2  [sd_mod sbp2]
ohci1394               28384   0  (unused)
ieee1394               60352   0  [sbp2 ohci1394]
#

If the ieee1394 module was not loaded, then your FireWire adapter might not be supported.
I'm using the StarTech 4 Port IEEE-1394 PCI Firewire Card which works great:
# lspci
...
00:0c.0 FireWire (IEEE 1394): VIA Technologies, Inc. OHCI Compliant IEEE 1394 Host Controller (rev 46)
...

To detect the external FireWire drives, download the rescan-scsi-bus.sh script from http://oss.oracle.com/projects/firewire/files/old. Copy the script to /usr/local/bin and make sure it has the right access permissions:
su - root
chmod 755 /usr/local/bin/rescan-scsi-bus.sh

Now run the script to rescan the SCSI bus and to add the FireWire drive to the system:
su - root
# /usr/local/bin/rescan-scsi-bus.sh
Host adapter 0 (sbp2_0) found.
Scanning for device 0 0 0 0 ...
NEW: Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: Maxtor   Model: OneTouch         Rev: 0200
      Type:   Direct-Access                    ANSI SCSI revision: 06
1 new device(s) found.
0 device(s) removed.
#

When you run dmesg, you should see entries similar to this example:
# dmesg
...
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9]  MMIO=[fedff000-fedff7ff]  Max Packet=[2048]
ieee1394: Device added: Node[00:1023]  GUID[0010b920008c85cb]  [Maxtor]
ieee1394: Device added: Node[01:1023]  GUID[00110600000032a0]  [Linux OHCI-1394]
ieee1394: Host added: Node[02:1023]  GUID[00110600000032c7]  [Linux OHCI-1394]
ieee1394: Device added: Node[04:1023]  GUID[00110600000032d0]  [Linux OHCI-1394]
SCSI subsystem driver Revision: 1.00
scsi0 : SCSI emulation for IEEE-1394 SBP-2 Devices
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 0
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[00:1023]: Max speed [S400] - Max payload [2048]
scsi singledevice 0 0 0 0
  Vendor: Maxtor    Model: OneTouch          Rev: 0200
  Type:   Direct-Access                      ANSI SCSI revision: 06
blk: queue cc28b214, I/O limit 4095Mb (mask 0xffffffff)
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 490232832 512-byte hdwr sectors (250999 MB)
 sda: sda1 sda2
scsi singledevice 0 0 1 0
scsi singledevice 0 0 2 0
scsi singledevice 0 0 3 0
scsi singledevice 0 0 4 0
scsi singledevice 0 0 5 0
scsi singledevice 0 0 6 0
scsi singledevice 0 0 7 0
...
The kernel reports that the FireWire drive can concurrently be shared by 3 servers (see "Maximum concurrent logins supported:"). It is very important that you have a drive where the chip supports concurrent access for the nodes. The "Number of active logins:" shows how many servers are already sharing the drive before this server added this drive to the system.

Problems: If everything worked fine, you should now be able to see the partitions of the FireWire drives:
su - root
# fdisk -l

Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1      2550  20482843+  83  Linux
...

Automatic Scanning of FireWire-Based Shared Storage

To have the FireWire drives automatically added to the system after each reboot, I wrote a small service script called "fwocfs" for FireWire drives. This service script also mounts OCFS filesystems if configured, see
Configuring the OCFS File Systems to Mount Automatically at Startup for more information. Therefore, this service script can be used for OCFS filesystems or for raw devices. It is also very useful for reloading the kernel modules for the FireWire drives and for rescanning the SCSI bus if your FireWire drives were not recognized. The "fwocfs" script can be downloaded from here.

To install this service script, run the following commands:
su - root
# cp fwocfs /etc/rc.d/init.d
# chmod 755 /etc/rc.d/init.d/fwocfs
# chkconfig --add fwocfs
# chkconfig --list fwocfs
fwocfs          0:off   1:off   2:on    3:on    4:on    5:on    6:off
# 
Now start the new fwocfs service:
su - root
# service fwocfs start
Loading ohci1394:                                          [  OK  ]
Loading sbp2:                                              [  OK  ]
Rescanning SCSI bus:                                       [  OK  ]
#
The next time when you reboot the server, the FireWire drives should have been added to the system automatically.

If for any reason the FireWire drives have not been recognized, try to restart the "fwocfs" service script with the following command:
su - root
service fwocfs restart

Creating Oracle User Accounts

If you use OCFS, it is important that the UID of "oracle" and GID of "oinstall" are the same across all RAC nodes. Otherwise the Oracle files on the OCFS filesystems on some nodes would either be "unowned", or even be owned by another user account. In my setup the UID and GID of oracle:dba is 700:700.
su - root
groupadd -g 700 dba          # group of users to be granted with SYSDBA system privilege
groupadd -g 701 oinstall     # group owner of Oracle files
useradd -c "Oracle software owner" -u 700 -g oinstall -G dba oracle
passwd oracle
To verify the oracle account, enter the following command:
# id oracle
uid=700(oracle) gid=701(oinstall) groups=701(oinstall),700(dba)

For more information on the "oinstall" group account, see
When to use "OINSTALL" group during install of oracle.

Setting Oracle Environments

When you set the Oracle environment variables for the RAC nodes, make sure to assign each RAC node a unique Oracle SID!

In my test setup, the database name is "orcl" and the Oracle SIDs are "orcl1" for RAC node one, "orcl2" for RAC node two, etc.
# Oracle Environment
export ORACLE_BASE=/opt/oracle
export ORACLE_HOME=/opt/oracle/product/9.2.0
export ORACLE_SID=orcl1      # Each RAC node must have a unique Oracle SID! E.g. orcl1, orcl2,...
export ORACLE_TERM=xterm
# export TNS_ADMIN= Set if sqlnet.ora, tnsnames.ora, etc. are not in $ORACLE_HOME/network/admin
export NLS_LANG=AMERICAN;
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export LD_LIBRARY_PATH

# Set shell search paths
export PATH=$PATH:$ORACLE_HOME/bin

# Specify that native threads should be used when running Java software
export THREADS_FLAG=native
Native threads are implemented using pthreads (POSIX threads) which take full advantage of multiprocessors. Java also supports green threads. Green threads are user-level threads, implemented within a single Unix process, and run on a single processor.

Make sure to add these environment settings to the ~oracle/.bash_profile file if you use bash. This will make sure that the Oracle environment variables are always set when you login as "oracle", or when you switch to the user "oracle" by running "su - oracle".

Sizing Oracle Disk Space for Database Software

You will need about 2.5 GB for the Oracle9i RAC database software.

Here is the estimate file space usage on one of my RAC servers:
$ du -s -m /opt/oracle
2344    /opt/oracle

Creating Oracle Directories

At the time of this writing, OCFS only supports Oracle Datafiles and a few other files. Therefore OCFS should not be used for Shared Oracle Home installs. See Installing and Configuring Oracle Cluster File Systems (OCFS) for more information. Therefore I'm creating a separate, individual ORACLE_HOME directory on each and every RAC server for the Oracle binaries.


To create the Oracle directories, go to:
Creating Oracle Directories

To create the Oracle datafile directory for the ORCL database, run the following commands:
su - oracle
mkdir -p /var/opt/oracle/oradata/orcl
chmod -R 775 /var/opt/oracle

Some directories are not replicated properly to other nodes when RAC is installed. Therefore the following commands must be run on all cluster nodes:
su - oracle

# For Cluster Manager
mkdir -p $ORACLE_HOME/oracm/log

# For SQL*Net Listener
mkdir -p $ORACLE_HOME/network/log
mkdir -p $ORACLE_HOME/network/trace

# For database instances
mkdir -p $ORACLE_HOME/rdbms/log
mkdir -p $ORACLE_HOME/rdbms/audit

# For Oracle Intelligent Agent
mkdir -p $ORACLE_HOME/network/agent/log
mkdir -p $ORACLE_HOME/network/agent/reco

Creating Partitions on Shared Storage Devices

The partitioning of a shared disk needs to be performed on only one RAC node!

General

Note that it is important for the Redo Log files to be on the shared disks as well.

To partition the disks, you can use the fdisk utility:
su - root
fdisk <device_name>
For SCSI disks (including FireWire disks), <device_name> stands for device names like /dev/sda, /dev/sdb, /dev/sdc, dev/sdd , etc. Be careful to use the right device name!

Here is an example how to create a new 50 GB partition on drive /dev/sda:
su - root
# fdisk /dev/sda

The number of cylinders for this disk is set to 30515.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1      6375  51207156   83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (6376-30515, default 6376):
Using default value 6376
Last cylinder or +size or +sizeM or +sizeK (6376-30515, default 30515): +50GB

Command (m for help): p

Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1      6375  51207156   83  Linux
/dev/sda2          6376     12750  51207187+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
information.
Syncing disks.
#
For more information on fdisk, see the fdisk(8) man page.


Creating Partitions for OCFS

If you use OCFS for database files and other Oracle files, you can create several partitions on your shared storage for the OCFS filesystems. If you use a FireWire disk, you could create one large partition on the disk which should make things easier.

For more information on how to install OCFS and how to mount OCFS filesystems on partitions, see
Installing and Configuring Oracle Cluster File Systems (OCFS).

After you finished creating the partitions, I recommend that you reboot the kernel on all RAC nodes to make sure all partitions are recognized by the kernel on all RAC nodes:
su - root
reboot

Creating Partitions for Raw Devices

In the follwing example I will show how to setup raw devices on FireWire disk(s) for all Oracle files including the
Cluster Manager Quorum File and the Shared Configuration File for srvctl. It requires more administrative work to use raw devices for Oracle datafiles. And using a FireWire drive makes it even more complicated since there is a hard limit of 15 partitions per SCSI drive since the FireWire drive uses the SCSI layer.

In the following example I will setup 19 partitions for an Oracle9i RAC database using raw devices on two FireWire disks. Keep in mind that we can only have 14 partitions on a single FireWire drive not including the Extended Partition.

Using 2 MB for the Cluster Manager Quorum File (raw device) should be more than sufficient. And using 20 MB for the Shared Configuration File (raw device) for the Oracle Global Services daemon should be more than sufficient as well.

After I created the following 19 partitions on one RAC node, I bound the raw devices by running the raw command on all RAC nodes:
su - root

 # /dev/sda1: Left it untouched since I want to keep it for my OCFS filesystem

 raw /dev/raw/raw1  /dev/sda2         # Used for the Cluster Manager Quorum File
 raw /dev/raw/raw2  /dev/sda3         # Used for the Shared Configuration file for srvctl

 # /dev/sda4: Used for creating the Extended Partition which starts as /dev/sda5.

 raw /dev/raw/raw3  /dev/sda5         # spfileorcl.ora
 raw /dev/raw/raw4  /dev/sda6         # control01.ctl
 raw /dev/raw/raw5  /dev/sda7         # control02.ctl
 raw /dev/raw/raw6  /dev/sda8         # indx01.dbf
 raw /dev/raw/raw7  /dev/sda9         # system01.dbf
 raw /dev/raw/raw8  /dev/sda10        # temp01.dbf
 raw /dev/raw/raw9  /dev/sda11        # tools01.dbf
 raw /dev/raw/raw10 /dev/sda12        # undotbs01.dbf
 raw /dev/raw/raw11 /dev/sda13        # undotbs02.dbf
 raw /dev/raw/raw12 /dev/sda14        # undotbs03.dbf
 raw /dev/raw/raw13 /dev/sda15        # users01.dbf

 # /dev/sda16: Cannot be used for a partition. /dev/sda16 is really the same device node as /dev/sdb.

 # /dev/sdb1 - /dev/sdb3: Left it unused.
 # /dev/sdb4:  Used for creating the Extended Partition which starts as /dev/sdb5.

 raw /dev/raw/raw14 /dev/sdb5         # redo01.log        (Group# 1 Thread# 1)
 raw /dev/raw/raw15 /dev/sdb6         # redo02.log        (Group# 2 Thread# 1)
 raw /dev/raw/raw16 /dev/sdb7         # redo03.log        (Group# 3 Thread# 2)
 raw /dev/raw/raw17 /dev/sdb8         # orcl_redo2_2.log  (Group# 4 Thread# 2)
 raw /dev/raw/raw18 /dev/sdb9         # orcl_redo3_1.log  (Group# 5 Thread# 3)
 raw /dev/raw/raw19 /dev/sdb10        # orcl_redo3_2.log  (Group# 6 Thread# 3)

To see all bindings, run:
su - root
raw -qa

NOTE: It is important to make sure that the above binding commands are added to the /etc/rc.local file! The binding for raw devices has to be done after each reboot.


Set the permissions and ownership for the 19 raw devices on all all RAC nodes:
su - root

for i in `seq 1 19`
do
    chmod 660 /dev/raw/raw$i
    chown oracle.dba /dev/raw/raw$i
done

Optionally, you can create soft links to the raw devices. If you do this it will be transparent to you whether you use OCFS or raw devices when you run the Oracle Database Assistant. In the following example I use the exact same file names which the Database Configuration Assistant will use for the cluster system by default.

To create the soft links, run the following command on all RAC nodes.
su - oracle
ln -s /dev/raw/raw1 /var/opt/oracle/oradata/orcl/CMQuorumFile
ln -s /dev/raw/raw2 /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile

ln -s /dev/raw/raw3 /var/opt/oracle/oradata/orcl/spfileorcl.ora

ln -s /dev/raw/raw4  /var/opt/oracle/oradata/orcl/control01.ctl
ln -s /dev/raw/raw5  /var/opt/oracle/oradata/orcl/control02.ctl
ln -s /dev/raw/raw6  /var/opt/oracle/oradata/orcl/indx01.dbf
ln -s /dev/raw/raw7  /var/opt/oracle/oradata/orcl/system01.dbf
ln -s /dev/raw/raw8  /var/opt/oracle/oradata/orcl/temp01.dbf
ln -s /dev/raw/raw9  /var/opt/oracle/oradata/orcl/tools01.dbf
ln -s /dev/raw/raw10 /var/opt/oracle/oradata/orcl/undotbs01.dbf
ln -s /dev/raw/raw11 /var/opt/oracle/oradata/orcl/undotbs02.dbf
ln -s /dev/raw/raw12 /var/opt/oracle/oradata/orcl/undotbs03.dbf
ln -s /dev/raw/raw13 /var/opt/oracle/oradata/orcl/users01.dbf

ln -s /dev/raw/raw14 /var/opt/oracle/oradata/orcl/redo01.log
ln -s /dev/raw/raw15 /var/opt/oracle/oradata/orcl/redo02.log
ln -s /dev/raw/raw16 /var/opt/oracle/oradata/orcl/redo03.log
ln -s /dev/raw/raw17 /var/opt/oracle/oradata/orcl/orcl_redo2_2.log
ln -s /dev/raw/raw18 /var/opt/oracle/oradata/orcl/orcl_redo3_1.log
ln -s /dev/raw/raw19 /var/opt/oracle/oradata/orcl/orcl_redo3_2.log

After you finished creating the partitions, I recommend that you reboot the kernel on all RAC nodes to make sure all partitions are recognized by the kernel on all RAC nodes:
su - root
reboot

Installing and Configuring Oracle Cluster File Systems (OCFS)

Note that OCFS is not required for 9i RAC. In fact, I never use OCFS for RAC systems. However, this article covers OCFS since some people want to know how to configure and use OCFS.

The
Oracle Cluster File System (OCFS) was developed by Oracle to overcome the limits of Raw Devices and Partitions. It also eases administration of database files because it looks and feels just like a regular file system.

At the time of this writing, OCFS only supports Oracle Datafiles and a few other files:
- Redo Log files
- Archive log files
- Control files
- Database datafiles
- Shared quorum disk file for the cluster manager
- Shared init file (srv)

Oracle says that in the later part of 2003 they will support Shared Oracle Home installs. So don't install the Oracle software on OCFS yet. See Oracle Cluster File System for more information. In this article I'm creating a separate, individual ORACLE_HOME directory on local server storage for each and every RAC node.

NOTE:
If files on the OCFS file system need to be moved, copied, tar'd, etc., or if directories need to be created on OCFS, then the standard file system commands mv, cp, tar,... that come with the OS should not be used. These OS commands can have a major OS performance impact if they are being used on the OCFS file system. Therefore, Oracle's patched file system commands should be used instead.
It is also important to note that some 3rd vendor backup tools make use of standard OS commands like tar.


Installing OCFS

You can download the OCFS RPMs for RH AS 2.1 from
http://oss.oracle.com/projects/ocfs/files/RedHat/RHAS2.1/ (make sure to use the latest OCFS version!). For OCFS RPMs for FireWire shared disks, see Installing OCFS for FireWire Shared Disks Only.

To find out which OCFS driver you need for your server, run:
$ uname -a
Linux rac1pub 2.4.9-e.25smp #1 Fri Oct 6 18:27:21 EDT 2003 i686 unknown

For my SMP servers with <=4GB RAM I downloaded the following OCFS RPMs (make sure to use the latest OCFS version!):
  ocfs-2.4.9-e-smp-1.0.9-9.i686.rpm         # OCFS driver for SMP kernels
  ocfs-tools-1.0.9-9.i686.rpm
  ocfs-support-1.0.9-9.i686.rpm
To install the RPMs for SMP kernels on servers with <= 4 GB RAM, run:
su - root
rpm -ivh ocfs-2.4.9-e-smp-1.0.9-9.i686.rpm \
ocfs-tools-1.0.9-9.i686.rpm \
ocfs-support-1.0.9-9.i686.rpm
To install the OCFS RPMs for uniprocessor kernels, run:
su - root
rpm -ivh ocfs-2.4.9-e-1.0.9-9.i686.rpm
ocfs-tools-1.0.9-9.i686.rpm \
ocfs-support-1.0.9-9.i686.rpm

NOTE: It is also very important to install an updated fileutils package that adds support for the O_DIRECT flag which controls the use of synchronous I/O on file systems such as OCFS. This updated package is required for better performance. If commands like cp, mv, dd, etc. don't support the O_DIRECT flag when used on OCFS file systems, then you can experience a big performance impact. For instance, some 3rd party products use the dd command for doing the backup.

Therefore, it is recommended to download the fileutils package version 4.1-10.4 (RHSA-2003:310) or a newer version from https://rhn.redhat.com. To upgrade the RPM, run:
su - root
rpm -Uvh fileutils-4.1-10.4.i386.rpm

Installing OCFS for FireWire Shared Disks Only

For FireWire kernels, download the latest OCFS RPMs for RH AS 2.1 from
http://oss.oracle.com/projects/firewire/files.

To find out which OCFS driver you need for your server, run:
$ uname -a
Linux rac1pub 2.4.9-e.25 #1 Fri Oct 6 18:27:21 EDT 2003 i686 unknown

To install the OCFS RPMs for uniprocessor kernels, run e.g. (make sure to use the latest OCFS version!):
su - root
rpm -Uvh ocfs-2.4.20-18.10-1.0.10-2.i386.rpm \
ocfs-tools-1.0.10-2.i386.rpm \
ocfs-support-1.0.10-2.i386.rpm
I would also recommend to update the fileutils package, see note above.


Configuring OCFS

To generate the /etc/ocfs.conf file, run the ocfstool tool. But before you run the GUI tool, make sure you have set the DISPLAY environment variable. You can find a short description about the DISPLAY environment variable
here

Run the ocfstool tool as root to generate the /etc/ocfs.conf file:
su - root
ocfstool
 - Select "Task" - Select "Generate Config"
 - Select the interconnect interface (private network interface), e.g. rac1prv
 - Confirm the values displayed and exit
The generated /etc/ocfs.conf file will appear similar to the following example:
$ cat /etc/ocfs.conf
#
# ocfs config
# Ensure this file exists in /etc#

        node_name = rac1prv
        node_number =
        ip_address = 92.168.1.1
        ip_port = 7000
        guid = 167045A6AD4E9EAB33620010B5C05E7F
The guid entry is the unique group user ID. This ID has to be unique for each node. You can create the above file without the ocfstool tool by editing the /etc/ocfs.conf file manually and by running ocfs_uid_gen -c to assign/update the guid value in this file.

To load the ocfs.o kernel module, run:
su - root
# /sbin/load_ocfs
/sbin/insmod ocfs node_name=rac1prv ip_address=92.168.1.1 ip_port=7000 cs=1812 guid=582A8C17A7555FA41D350010B5C05E7F
Using /lib/modules/2.4.9-e-ABI/ocfs/ocfs.o
#
Note that the load_ocfs command doesn't have to be executed again once everything has been setup for the OCFS filesystems, see Configuring the OCFS File Systems to Mount Automatically at Startup.


Additional Changes for Configuring OCFS for FireWire Drives

If a FireWire storage is being used, the OCFS File Systems won't mount automatically at startup with the steps described above. Some addional changes need to be made.

When I run load_ocfs on a system with the experimental FireWire kernel, it returns the following error message:
su - root
# load_ocfs
/sbin/insmod ocfs node_name=rac1prv ip_address=192.168.1.1 ip_port=7000 cs=1841 guid=BB669BEFEA6C470479D10050DA1A2424 comm_voting=1
insmod: ocfs: no module by that name found
load_ocfs: insmod failed
#
The ocfs.o module can be found here:
su - root
# rpm -ql ocfs-2.4.20
/lib/modules/2.4.20-ABI/ocfs
/lib/modules/2.4.20-ABI/ocfs/ocfs.o
#
So for the experimental kernel for FireWire drives, I manually created a link for the ocfs.o file:
su - root
mkdir /lib/modules/`uname -r`/kernel/drivers/addon/ocfs
ln -s `rpm -qa | grep ocfs-2 | xargs rpm -ql | grep "/ocfs.o$"` \
      /lib/modules/`uname -r`/kernel/drivers/addon/ocfs/ocfs.o
Now you should be able to load the OCFS module and the output will look similar to this example:
su - root
# /sbin/load_ocfs
/sbin/insmod /lib/modules/2.4.20-ABI/ocfs/ocfs.o node_name=rac1prv ip_address=192.168.1.1 ip_port=7000 cs=1833 guid=01A553F1FD7B719E9D290010B5C05E7F comm_voting=1
#

Creating OCFS File Systems

Before you continue with the next steps, make sure you've created all needed partitions on your shared storage.

Under
Creating Oracle Directories I created the /var/opt/oracle/oradata/orcl directory for the Oracle data files. In the following example I will create one large OCFS filesystem and mount it on /var/opt/oracle/oradata/orcl.

The following steps for creating the OCFS filesystem(s) should only be executed on one RAC node!

To create the OCFS filesystems, you can run the ocfstool:
su - root
ocfstool
  - Select "Task" - Select "Format"

Alternatively, you can run the "mkfs.ocfs" command to create the OCFS filesystems:
su - root
mkfs.ocfs -F -b 128 -L /var/opt/oracle/oradata/orcl -m /var/opt/oracle/oradata/orcl \
          -u `id -u oracle` -g `id -g oracle` -p 0775 <device_name>
Cleared volume header sectors
Cleared node config sectors
Cleared publish sectors
Cleared vote sectors
Cleared bitmap sectors
Cleared data block
Wrote volume header
#
For SCSI disks (including FireWire disks), <device_name> stands for devices like /dev/sda, /dev/sdb, /dev/sdc, dev/sdd, etc. Be careful to use the right device name! For this article I created a large OCFS filesystem on /dev/sda1.

mkfs.ocfs options:

  -F Forces to format existing OCFS volume
  -b Block size in kB. The block size must be a multiple of the Oracle block size. Oracle recommends to set the block size for OCFS to 128.
  -L Volume label
  -m Mount point for the device (in this article "/var/opt/oracle/oradata/orcl")
  -u UID for the root directory (in this article "oracle")
  -g GID for the root directory (in this article "oinstall")
  -p Permissions for the root directory


Mounting OCFS File Systems

As I mentioned previously, for this article I created one large OCFS fileystem on /dev/sda1. To mount the OCFS filesystem, run:
su - root
# mount -t ocfs /dev/sda1 /var/opt/oracle/oradata/orcl
Now run the ls command on all RAC nodes to check the ownership:
# ls -ld /var/opt/oracle/oradata/orcl
drwxrwxr-x    1 oracle   oinstall   131072 Aug 18 18:07 /var/opt/oracle/oradata/orcl
#
NOTE: If the above ls command does not display the same ownership on all RAC nodes (oracle:oinstall), then the "oracle" UID and the "oinstall" GID are not the same accross the RAC nodes, see
Creating Oracle User Accounts for more information.

If you get the following error:
su - root
# mount -t ocfs <device_name> /var/opt/oracle/oradata/orcl
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       or too many mounted file systems
You probably tried to mount the OCFS filesystem on more than one server at a time. Try to wait until the OCFS file system has been mounted on one server before you mount it on the other server.


Configuring the OCFS File Systems to Mount Automatically at Startup

To ensure the OCFS filesystems are mounted automatically during a reboot, the OCFS mount points need to be added to the /etc/fstab file.

Add lines to the /etc/fstab file similar to the following example:
/dev/sda1     /var/opt/oracle/oradata/orcl    ocfs   _netdev 0 0
The "_netdev" option prevents the OCFS filesystem from being mounted until the network has first been enabled on the system (see mount(8)) which provides access to the storage device.

To make sure the ocfs.o kernel module is loaded and the OCFS file systems are mounted during the boot process, run:
su - root
# chkconfig --list ocfs
ocfs            0:off   1:off   2:off   3:on    4:on    5:on    6:off
If the flags are not set to "on" as marked in bold, run the following command:
su - root
# chkconfig ocfs on
You can also start the "ocfs" service manually by running:
su - root
# service ocfs start
When you run this command it will not only load the ocfs.o kernel module but it will also mount the OCFS filesystems as configured in /etc/fstab.


Additional Changes for FireWire Storage to mount OCFS File Systems Automatically at Startup

If a FireWire storage is being used, the OCFS File Systems won't mount automatically at startup with the "ocfs" service script. In
Automatic Scanning of FireWire-Based I already introduced the "fwocfs" service script which can be used for mounting the OCFS file systems automatically for FireWire drives.

If you have not installed the "fwocfs" service script yet, follow the steps as outlined in Automatic Scanning of FireWire-Based. Since the "ocfs" service script does not work for FireWire drives, disable the "ocfs" service script. The "fwocfs" service script can now be used instead.
su - root
chkconfig ocfs off
Now start the new fwocfs service:
su - root
# service fwocfs start
Loading ohci1394:                                          [  OK  ]
Loading sbp2:                                              [  OK  ]
Rescanning SCSI bus:                                       [  OK  ]
Loading OCFS:                                              [  OK  ]
Mounting OCFS file systems:                                [  OK  ]
#
To see the mounted OCFS file systems, you can run the following command:
su - root
# service fwocfs status
/dev/sda1 on /var/opt/oracle/oradata/orcl type ocfs (rw,_netdev)
#
Next time when you reboot the server, the OCFS filesystems should be mounted automatically for the FireWire drives.

Installing the "hangcheck-timer" Kernel Module

To monitor system health of the cluster and to reset a RAC node in case of a failure, Oracle 9.0.1 and 9.2.0.1 used a userspace watchdog daemon called watchdogd. But starting with Oracle 9.2.0.2, this deamon has been deprecated by a Linux kernel module called hangcheck-timer which addresses availability and reliability problems much better. This hang-check timer is loaded into the kernel and checks if the system hangs. It sets a timer and checks the timer after a certain amount of time. If a configurable threshold is exceeded, it will reboot the machine.
The hangcheck-timer module is not required for Oracle Cluster Manager operation, but Oracle highly recommends it.

The hangcheck-timer.o Module

The hangcheck-timer module uses a kernel-based timer to periodically check the system task scheduler. This timer resets the node when the system hangs or pauses. This module uses the Time Stamp Counter (TSC) CPU register which is a counter that is incremented at each clock signal. The TCS offers much more accurate time measurements since this register is updated by the hardware automatically. For more information, see
Project: hangcheck-timer.

Installing the hangcheck-timer.o Module

Originally, hangcheck-timer was shipped by Oracle, but this module comes now with RH AS starting with kernel versions 2.4.9-e.12 and higher. So hangcheck-timer is now part of all newer RH AS kernels. If you upgraded the kernel as outlined in
Upgrading the Linux Kernel, then you should have the hangcheck-timer module on your node:
# find /lib/modules -name "hangcheck-timer.o"
/lib/modules/2.4.9-e.25smp/kernel/drivers/char/hangcheck-timer.o
#
Therefore you don't need to install the Oracle Cluster Manager patch (e.g. patch 2594820) before you install the Oracle9i database patch set.

Configuring and Loading the hangcheck-timer Module

The hangcheck-timer module parameters hangcheck-tick and hangcheck-margin need to be coordinated with the MissCount parameter in the $ORACLE_HOME/oracm/admin/cmcfg.ora file for the Cluster Manager.

The following two hangcheck-timer module parameter can be set:
 hangcheck_tick
   This parameter defines the period of time between checks of system health.
   The default value is 60 seconds. Oracle recommends to set it to 30 seconds.
 hangcheck_margin
   This parameter defines the maximum hang delay that should be tolerated before
   hangcheck-timer resets the RAC node. It defines the margin of error in seconds.
   The default value is 180 seconds. Oracle recommends to set it to 180 seconds.


These two parameters indicate how long a RAC node must hang before the hangcheck-timer module will reset the system. A node reset will occur when the following is true:
system hang time > (hangcheck_tick + hangcheck_margin)

To load the module with the right parameter settings, you can run the following command:
# su - root
# /sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180
# grep Hangcheck /var/log/messages |tail -1
Oct 18 23:05:36 rac1prv kernel: Hangcheck: starting hangcheck timer (tick is 30 seconds, margin is 180 seconds).
#
But the right way to load modules with the correct parameters is to make entries in the /etc/modules.conf file. To do that, add the following line to the /etc/modules.conf file:
# su - root
# echo "options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180" >> /etc/modules.conf
Now you can run modprobe to load the module with the configured parameters in /etc/modules.conf:
# su - root
# modprobe hangcheck-timer
# grep Hangcheck /var/log/messages |tail -1
Oct 18 23:10:23 rac1prv kernel: Hangcheck: starting hangcheck timer (tick is 30 seconds, margin is 180 seconds).
#
Note: To ensure the hangcheck-timer module is loaded after each reboot, add the modprobe command to the /etc/rc.local file.

Setting up RAC Nodes for Remote Access

When you run the Oracle Installer on a RAC node, it will use the rsh feature for copying Oracle software to other RAC nodes. Therefore, the oracle account on the RAC node where runIntaller is launched must be trusted by all other RAC nodes. This means that you should be able to run rsh, rcp, and rlogin on this RAC node against other RAC nodes without a password. The rsh daemon validates users using the /etc/hosts.equiv file and the .rhosts file found in the user's (oracle's) home directory. Unfortunatelly, SSH is not supported.

The following steps show how I setup a trusted environment for the "oracle" account on all RAC nodes.

First make sure the "rsh" RPMs are installed on all RAC nodes:
rpm -q rsh rsh-server
If rsh is not installed, run the following command:
su - root
rpm -ivh rsh-0.17-5.i386.rpm rsh-server-0.17-5.i386.rpm
To enable the "rsh" service, the "disable" attribute in the /etc/xinetd.d/rsh file must be set to "no" and xinetd must be refreshed. This can be done by running the following commands:
su - root
chkconfig rsh on
chkconfig rlogin on
service xinetd reload
To allow the "oracle" user account to be trusted among the RAC nodes, create the /etc/hosts.equiv file:
su - root
touch /etc/hosts.equiv
chmod 600 /etc/hosts.equiv
chown root.root /etc/hosts.equiv
And add all RAC nodes to the /etc/hosts.equiv file similar to the following example:
$ cat /etc/hosts.equiv
+rac1prv oracle
+rac2prv oracle
+rac3prv oracle
+rac1pub oracle
+rac2pub oracle
+rac3pub oracle
In the preceding example, the second field permits only the oracle user account to run rsh commands on the specified nodes. For security reasons, the /etc/hosts.equiv file should be owned by root and the permissions should be set to 600. In fact, some systems will only honor the content of this file if the owner of this file is root and the permissions are set to 600.

Now you should be able to run rsh against each RAC node without having to provide the password for the oracle account:
su - oracle
$ rsh rac1prv ls -l /etc/hosts.equiv
-rw-------    1 root     root           49 Oct 19 13:18 /etc/hosts.equiv
$ rsh rac2prv ls -l /etc/hosts.equiv
-rw-------    1 root     root           49 Oct 19 14:39 /etc/hosts.equiv
$ rsh rac3prv ls -l /etc/hosts.equiv
-rw-------    1 root     root           49 Oct 19 14:42 /etc/hosts.equiv
$

Checking Packages (RPMs)

Some packages will be missing when you selected the Installation Type "Advanced Server" during the Red Hat Advanced Server installation.

The following additional RPMs will be required:
rpm -q gcc cpp compat-libstdc++ glibc-devel kernel-headers binutils \
       pdksh ncurses4
To install these RPMS, run:
su - root
rpm -Uvh cpp-2.96-108.1.i386.rpm \
         glibc-devel-2.2.4-26.i386.rpm \
         kernel-headers-2.4.9-e.3.i386.rpm \
         gcc-2.96-108.1.i386.rpm \
         binutils-2.11.90.0.8-12.i386.rpm \
         pdksh-5.2.14-13.i386.rpm \
         ncurses4-5.0-5.i386.rpm
I recommend using the latest RPM versions.

Adjusting Network Settings

In Oracle 9.2.0.1 and onwards, Oracle now uses UDP as the default protocol on Linux for interprocess communication, such as cache fusion buffer transfers between the instances.
It is strongly suggested to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB. The receive buffers are used by TCP and UDP to hold received data until is is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.

The default and maximum window size can be changed in the proc file system without reboot:
su - root
sysctl -w net.core.rmem_default=262144  # Default setting in bytes of the socket receive buffer
sysctl -w net.core.wmem_default=262144  # Default setting in bytes of the socket send buffer
sysctl -w net.core.rmem_max=262144      # Maximum socket receive buffer size which may be set by using the SO_RCVBUF socket option
sysctl -w net.core.wmem_max=262144      # Maximum socket send buffer size which may be set by using the SO_SNDBUF socket option
To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process:
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=262144
net.core.wmem_max=262144

Sizing Swap Space

It is important to follow the steps as outlined in Sizing Swap Space.

Setting Shared Memory

It is important to follow the steps as outlined in Setting Shared Memory.

Checking /tmp Space

It is important to follow the steps as outlined in Checking /tmp Space.

Setting Semaphores

It is recommended to follow the steps as outlined in Setting Semaphores.

Setting File Handles

It is recommended to follow the steps as outlined in Setting File Handles.


Setting Up Oracle 9i Cluster Manager

General

At this point the pre-installation steps for all RAC nodes should be completed.

Checking OCFS and rsh

Before you continue, make sure the OCFS filesystems are mounted on all RAC nodes and that rsh really works for the oracle account on all RAC nodes. Here is the output of the df command on my RAC test system:
su - oracle
rsh rac1prv df | grep oradata
/dev/sda1             51205216     45152  51160064   1% /var/opt/oracle/oradata/orcl
rsh rac2prv df | grep oradata
/dev/sda1             51205216     45152  51160064   1% /var/opt/oracle/oradata/orcl
rsh rac3prv df | grep oradata
/dev/sda1             51205216     45152  51160064   1% /var/opt/oracle/oradata/orcl

rsh rac1pub df | grep oradata
/dev/sda1             51205216     45152  51160064   1% /var/opt/oracle/oradata/orcl
rsh rac2pub df | grep oradata
/dev/sda1             51205216     45152  51160064   1% /var/opt/oracle/oradata/orcl
rsh rac3pub df | grep oradata
/dev/sda1             51205216     45152  51160064   1% /var/opt/oracle/oradata/orcl

Downloading and Extracting Oracle Patch Set

It is wise to first patch the Oracle 9iR2 software before creating the database. To patch Oracle9iR2 (Cluster Manager, etc.), download the Oracle 9i Release 2 Patch Set 3 Version 9.2.0.4.0 for Linux x86 (patch number 3095277) from
metalink.oracle.com.

Copy the downloaded "p3095277_9204_LINUX.zip" file to e.g. /tmp and run the following commands:
su - oracle
$ unzip p3095277_9204_LINUX.zip
Archive:  p3095277_9204_LINUX.zip
  inflating: 9204_lnx32_release.cpio
  inflating: README.html
  inflating: patchnote.css
$
$ cpio -idmv < 9204_lnx32_release.cpio
Disk1/stage/locks
Disk1/stage/Patches/oracle.apache.isqlplus/9.2.0.4.0/1/DataFiles/bin.1.1.jar
Disk1/stage/Patches/oracle.apache.isqlplus/9.2.0.4.0/1/DataFiles/lib.1.1.jar
...

Installing Oracle 9i Cluster Manager

In order to install the Cluster Manager (Node Monitor) oracm on all RAC nodes, runInstaller has to be launched on only one RAC node, e.g. rac1prv. For more information on how to start runInstaller, see Starting runInstaller.

Creating the Cluster Manager Quorum File

The Cluster Manager and Node Monitor oracm accepts registration of Oracle instances to the cluster and it sends ping messages to Cluster Managers (Node Monitor) on other RAC nodes. If this heartbeat fails, oracm uses a quorum file or a quorum partition on the shared disk to distinguish between a node failure and a network failure. So if a node stops sending ping messages, but continues writing to the quorum file or partition, then the other Cluster Managers can recognize it as a network failure. The Cluster Manager (CM) uses now UDP instead of TCP for communication.

Once the Oracle Cluster Manager is running on all RAC nodes, OUI will automatically recognice all nodes on the cluster. When you run the Installer, you will see the "Cluster Node Selection" screen if the oracm process is running on the RAC nodes. This also means that you can launch runInstaller on one RAC node and have the Oracle software automatically installed on all other RAC nodes as well.

  • For OCFS Filesystems:
  • In the following "Oracle Cluster Manager" installation I will use a quorum file on the OCFS file system called /var/opt/oracle/oradata/orcl/CMQuorumFile which will be accessible by all RAC nodes. It is important that all RAC nodes are accessing the same quorum file! To create the quorum file, run the following command on one RAC node:
    su - oracle
    touch  /var/opt/oracle/oradata/orcl/CMQuorumFile
    
    NOTE Regarding Cluster Manager Quorum File on OCFS on FireWire Drives:

    If you have the quorum file on an OCFS (1.0.10-2 OCFS) file system on the FireWire drive, then the Cluster Manager oracm will only come up on one RAC node. If you start oracm on a second RAC node, it will crash. Until this bug is resolved, a raw device needs to be used for FireWire drives.
  • For Raw Devices:
  • In Creating Partitions for Raw Devices I created a 2 MB partitions (raw device) for the quorum file on my external FireWire drive. The name of my quorum partition on the FireWire drive is /dev/sda2 which is bound to /dev/raw/raw1.
    Optionally, you can create a soft link to this raw device. If you haven't done it yet as show in Creating Partitions for Raw Devices, then do it now by running the following command on all RAC nodes:
    su - oracle
    ln -s /dev/raw/raw1 /var/opt/oracle/oradata/orcl/CMQuorumFile
    

    Installing Oracle9i Cluster Manager 9.2.0.1.0

    To install the Oracle Cluster Manager, insert the Oracle 9i R2 Disk 1 and launch /mnt/cdrom/runInstaller. These steps only need to be performed on one RAC node, the node you are installing from.

    Mount the disk in one terminal:
    su - root
    mount /mnt/cdrom
    And in another terminal, run the following commands.
    su - oracle
    $ /mnt/cdrom/runInstaller
    
     - Welcome Screen:          Click Next
     - Inventory Location:      Click OK
     - Unix Group Name:         Use "oinstall".
     - Root Script Window:      Open another window, login as root, and run /tmp/orainstRoot.sh
                                 on the node where you are running this installation (runInstaller). 
                                 After you run the script, click Continue.
     - File Locations:          Check the defaults. I used the default values and clicked Next.
     - Available Products:      Select "Oracle Cluster Manager 9.2.0.1.0"
     - Public Node Information:
         Public Node 1:         rac1pub
         Public Node 2:         rac2pub
         Public Node 3:         rac3pub
                                Click Next.
     - Private Node Information:
         Private Node 1:        rac1prv
         Private Node 2:        rac2prv
         Private Node 3:        rac3prv
                                Click Next.
     - WatchDog Parameter:      Accept the default value and click Next. We won't use the Watchdog.
     - Quorum Disk Information: /var/opt/oracle/oradata/orcl/CMQuorumFile
                                Click Next.
     - Summary:                 Click Install
     - When installation has completed, click Exit.
    

    Applying Oracle9i Cluster Manager 9.2.0.4.0 Patch Set

    To patch the Oracle Cluster Manager, launch the installer either from /mnt/cdrom/runInstaller or from $ORACLE_HOME/bin/runInstaller
    su - oracle
    $ $ORACLE_HOME/bin/runInstaller
     - Welcome Screen:         Click Next
     - Inventory Location:     Click Next
     - File Locations:         - Click "Browse for the Source"
                                - Navigate to the stage directory where the patch set is located.
                                  On my system it is: "/tmp/Disk1/stage"
                                - Select the "products.jar" file.
                                - Click OK
                                - Click Next on the File Location screen
     - Available Products:     Select "Oracle9iR2 Cluster Manager 9.2.0.4.0
     - Public Node Information:
         Public Node 1:        rac1pub
         Public Node 2:        rac2pub
         Public Node 3:        rac3pub
     - Private Node Information:
         Private Node 1:       rac1prv
         Private Node 2:       rac2prv
         Private Node 3:       rac3prv
     - Summary:                Click Install
     - When installation has completed, click Exit.
    

    Configuring Oracle 9i Cluster Manager

    The following changes have to be done on ALL RAC nodes.

    It is not necessary to run the watchdogd daemon with the Oracle Cluster Manager 9.2.0.2 or with any newer version. Since watchdogd has been replaced with the
    hangcheck-timer kernel module, some files need to be updated.

    The following changes need to be done if the Oracle9i Cluster Manager 9.2.0.1.0 has been patched to e.g. version 9.2.0.4.0 as described under Applying Oracle9i Cluster Manager 9.2.0.4.0 Patch Set.


    REMOVE or comment out the following line(s) from the $ORACLE_HOME/oracm/admin/ocmargs.ora file:
    watchdogd
    REMOVE the following line(s) from the $ORACLE_HOME/oracm/admin/cmcfg.ora file:
    WatchdogSafetyMargin=5000
    WatchdogTimerMargin=60000
    ADD the following line to the $ORACLE_HOME/oracm/admin/cmcfg.ora file:
    KernelModuleName=hangcheck-timer
    ADJUST the value of the MissCount parameter in the $ORACLE_HOME/oracm/admin/cmcfg.ora file based on the sum of the hangcheck_tick and hangcheck_margin values. The MissCount parameter must be set to at least 60 and it must be greater than the sum of hangcheck_tick + hangcheck_margin. In my example, hangcheck_tick + hangcheck_margin is 210. Therefore I set MissCount in $ORACLE_HOME/oracm/admin/cmcfg.ora to 215.
    MissCount=215
    My $ORACLE_HOME/oracm/admin/cmcfg.ora file looks as follows:
    HeartBeat=15000
    ClusterName=Oracle Cluster Manager, version 9i
    PollInterval=1000
    MissCount=215
    PrivateNodeNames=rac1prv rac2prv rac3prv
    PublicNodeNames=rac1pub rac2pub rac3pub
    ServicePort=9998
    CmDiskFile=/var/opt/oracle/oradata/orcl/CMQuorumFile
    HostName=rac1prv
    KernelModuleName=hangcheck-timer
    
    
    MODIFY the $ORACLE_HOME/oracm/bin/ocmstart.sh file and comment out the following lines:
    # watchdogd's default log file
    #WATCHDOGD_LOG_FILE=$ORACLE_HOME/oracm/log/wdd.log
    
    # watchdogd's default backup file
    #WATCHDOGD_BAK_FILE=$ORACLE_HOME/oracm/log/wdd.log.bak
    
    # Get arguments
    #watchdogd_args=`grep '^watchdogd' $OCMARGS_FILE |\
    #  sed -e 's+^watchdogd *++'`
    
    # Check watchdogd's existance
    #if watchdogd status | grep 'Watchdog daemon active' >/dev/null
    #then
    #  echo 'ocmstart.sh: Error: watchdogd is already running'
    #  exit 1
    #fi
    
    # Backup the old watchdogd log
    #if test -r $WATCHDOGD_LOG_FILE
    #then
    #  mv $WATCHDOGD_LOG_FILE $WATCHDOGD_BAK_FILE
    #fi
    
    # Startup watchdogd
    #echo watchdogd $watchdogd_args
    #watchdogd $watchdogd_args

    Starting and Stopping Oracle 9i Cluster Manager

    To start the Cluster Manager (CM) and Node Monitor oracm, run the following commands on all RAC nodes:
    su - root
    # . ~oracle/.bash_profile      # Set Oracle environment
    # $ORACLE_HOME/oracm/bin/ocmstart.sh
    oracm </dev/null 2>&1 >/opt/oracle/product/9.2.0/oracm/log/cm.out &
    #
    # ps -ef |grep oracm
    root     15249     1  0 Nov08 pts/2    00:00:00 oracm
    root     15251 15249  0 Nov08 pts/2    00:00:00 oracm
    root     15252 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15253 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15254 15251  0 Nov08 pts/2    00:00:04 oracm
    root     15255 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15256 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15257 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15258 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15298 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15322 15251  0 Nov08 pts/2    00:00:00 oracm
    root     15540 15251  0 Nov08 pts/2    00:00:00 oracm
    #
    
    NOTE:

    Once the Cluster Manager is upgraded, oracm won't come up any more if you use OCFS. It will always die after a few seconds. To fix this run the following command on only one RAC node:
    su - oracle
    dd if=/dev/zero of=/var/opt/oracle/oradata/orcl/CMQuorumFile bs=4096 count=96
    After that restart CM.

    To stop oracm, you have to kill it:
    su - root
    pkill oracm
    For more information on Oracle cluster administration, see
    Oracle9i Real Application Clusters Administration.


    Note about Using the procps RPM:

    If the procps RPM was installed, you will only see one oracm process in the process table when you run the ps command. That's because the ps command that comes with the procps RPM does not show a thread as a separate process in the ps output.
    rpm -qf /bin/ps
    procps-2.0.7-11

    Installing Oracle 9i Real Application Cluster

    General

    Before you install the Oracle9i Real Application Cluster 9.2.0.1.0 software (RAC software + database software), you have to make sure that the pdksh and ncurses4 RPMs are installed on all RAC nodes! If these RPMs are not installed, you will get the following error message when you run $ORACLE_HOME/root.sh on each RAC node during the software installation:
    ...
    error: failed dependencies:
            libncurses.so.4 is needed by orclclnt-nw_lssv.Build.71-1
    error: failed dependencies:
            orclclnt = nw_lssv.Build.71-1 is needed by orcldrvr-nw_lssv.Build.71-1
    error: failed dependencies:
            orclclnt = nw_lssv.Build.71-1 is needed by orclnode-nw_lssv.Build.71-1
            orcldrvr = nw_lssv.Build.71-1 is needed by orclnode-nw_lssv.Build.71-1
            libscsi.so is needed by orclnode-nw_lssv.Build.71-1
            libsji.so is needed by orclnode-nw_lssv.Build.71-1
    error: failed dependencies:
            orclclnt = nw_lssv.Build.71-1 is needed by orclserv-nw_lssv.Build.71-1
            orclnode = nw_lssv.Build.71-1 is needed by orclserv-nw_lssv.Build.71-1
            /bin/ksh is needed by orclserv-nw_lssv.Build.71-1
    package orclman-nw_lssv.Build.71-1 is already installed
    
    **      Installation of LSSV did not succeed.  Please refer
    **      to the Installation Guide at http://www.legato.com/LSSV
    **      and contact Oracle customer support if necessary.
    

    To check for these RPMs, run the following command:
    rpm -q pdksh ncurses4
    
    To install these RPMs, run:
    su - root
    rpm -Uvh pdksh-5.2.14-13.i386.rpm ncurses4-5.0-5.i386.rpm
    

    Installing Oracle 9i Database Software with Real Application Cluster

    Creating the Shared Configuration File for srvctl

    A shared configuration file is needed for the srvctl utility which is used to manage Real Application Clusters instances and listeners.

  • For OCFS Filesystems:
  • To create the shared configuration file for srvctl, run the following command on one RAC node::
    su - oracle
    touch  /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
    
  • For Raw Devices:
  • In Creating Partitions for Raw Devices I created a 20 MB partitions (raw device) for the shared configuration file. The device name of the shared configuration file is /dev/sda1 which is bound to /dev/raw/raw2.
    Optionally, you can create a soft link to this raw device. If you haven't done it yet as show under Creating Partitions for Raw Devices, do it now by running the following command on all RAC nodes:
    su - oracle
    ln -s /dev/raw/raw2 /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
    

    Installing Oracle9i 9.2.0.1.0 Database Software with Oracle9i Real Application Cluster


    To install the Oracle9i Real Application Cluster 9.2.0.1.0 software, insert the Oracle9iR2 Disk 1 and launch runInstaller. These steps only need to be performed on one node, the node you are installing from.

    Mount the disk in one terminal:
    su - root
    mount /mnt/cdrom
    And in another terminal, run the following commands. Do not change directory to /mnt/cdrom before running the runInstaller script, or you will be unable to unmount Disk1 and mount Disk2 and Disk3.
    su - oracle
    $ /mnt/cdrom/runInstaller
     - Welcome Screen:         Click Next
     - Cluster Node Selection: Select/Highlight all RAC nodes using the shift key and the left mouse button.
                               Click Next
                               Note: If not all RAC nodes are showing up, or if the Node Selection Screen
                               does not appear, then the Oracle Cluster Manager (Node Monitor) oracm is probably not
                               running on all RAC nodes. See Starting and Stopping Oracle 9i Cluster Manager for more information.
     - File Locations:         Click Next
     - Available Products:     Select "Oracle9i Database 9.2.0.1.0" and click Next
     - Installation Types:     Select "Enterprise Edition" and click Next
     - Database Configuration: Select "Software Only" and click Next
     - Shared Configuration File Name:
                               Enter the name of an OCFS shared configuration file or the name of
                               the raw device name.
                               Select "/var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile" and click Next
     - Summary:                Click Install.
                               When the window for running "root.sh" comes up, run it on ALL RAC servers before clicking "OK". See below.
     - When installation has completed, click Exit.
    
       When the Oracle Enterprise Manager Console comes up, don't install a database yet.
    
    
    When the Install window displays "Performing remote operations (99%)", you will see a command like this one running on the RAC nodes:
      bash -c /bin/sh -c cd /; cpio  -idmuc
    If this command is running, it shows that the Oracle software is currently being installed on the RAC node(s).

    NOTE: There are still bugs that prevent Oracle sometimes from installing the Oracle software on all RAC nodes. If the Installer hangs at "Performing remote operations (99%)" and the above bash command is not running on the Oracle RAC nodes any more, then you need to abort the installation. One time I kept the Installer running the whole night without success. And one time I had to run the Installer five times because it was alway hanging at "Performing remote operations (99%)". A workaround would be to run runInstaller on all RAC nodes to install the software on each RAC node separately.


    Applying Oracle9i 9.2.0.4.0 Patch Set


    Before any other Oracle patches are applied, you first need to patch runInstaller in $ORACLE_HOME/bin.

    But before runInstaller is started, the following commands must be run to avoid runInstaller from crashing:
    su - oracle
    cd $ORACLE_BASE/oui/bin/linux
    ln -s libclntsh.so.9.0 libclntsh.so
    If you don't create the link, runInstaller will crash with an error messages similar to this one:
    An unexpected exception has been detected in native code outside the VM.
    Unexpected Signal : 11 occurred at PC=0x40008e4a
    Function name=_dl_lookup_symbol
    Library=/lib/ld-linux.so.2
    
    Current Java thread:
            at java.lang.ClassLoader$NativeLibrary.find(Native Method)
            at java.lang.ClassLoader.findNative(ClassLoader.java:1441)
            at ssOiGenClassux22.linkExists(Native Method)
            at sscreateLinkux.createLink(sscreateLinkux.java:256)
            at sscreateLinkux.installAction(sscreateLinkux.java:83)
            at oracle.sysman.oii.oiis.OiisCompActions.doActionWithException(OiisCompActions.java:1357)
            at oracle.sysman.oii.oiis.OiisCompActions.doActionImpl(OiisCompActions.java:1157)
            at oracle.sysman.oii.oiis.OiisCompActions.doAction(OiisCompActions.java:604)
            at Patches.oracle.cartridges.context.v9_2_0_4_0.CompActions.doAction(Unknown Source)
            at Patches.oracle.cartridges.context.v9_2_0_4_0.CompInstallPhase1.doActionP1createLink11(Unknown Source)
            at Patches.oracle.cartridges.context.v9_2_0_4_0.CompInstallPhase1.stateChangeActions(Unknown Source)
            at Patches.oracle.cartridges.context.v9_2_0_4_0.CompActions.stateChangeActions(Unknown Source)
            at oracle.sysman.oii.oiic.OiicInstallActionsPhase$OiilActionThr.run(OiicInstallActionsPhase.java:604)
            at oracle.sysman.oii.oiic.OiicInstallActionsPhase.executeProductPhases(OiicInstallActionsPhase.java:2199)
            at oracle.sysman.oii.oiic.OiicInstallActionsPhase.doInstalls(OiicInstallActionsPhase.java:2052)
            at oracle.sysman.oii.oiic.OiicInstallActionsPhase$OiInstRun.run(OiicInstallActionsPhase.java:2945)
            at java.lang.Thread.run(Thread.java:484)
    
    Dynamic libraries:
    08048000-0804c000 r-xp 00000000 03:05 1044862    /opt/oracle/jre/1.3.1/bin/i386/native_threads/java
    0804c000-0804d000 rw-p 00003000 03:05 1044862    /opt/oracle/jre/1.3.1/bin/i386/native_threads/java
    40000000-40016000 r-xp 00000000 03:05 767065     /lib/ld-2.2.4.so
    40016000-40017000 rw-p 00015000 03:05 767065     /lib/ld-2.2.4.so
    40018000-40025000 r-xp 00000000 03:05 767061     /lib/i686/libpthread-0.9.so
    40025000-40029000 rw-p 0000c000 03:05 767061     /lib/i686/libpthread-0.9.so
    ...
    4bb27000-4bb2e000 r-xp 00000000 03:05 2415686    /opt/oracle/oui/bin/linux/libsrvm.so
    4bb2e000-4bb2f000 rw-p 00006000 03:05 2415686    /opt/oracle/oui/bin/linux/libsrvm.so
    4bf27000-4c823000 r-xp 00000000 03:05 1518276    /opt/oracle/product/9.2.0/lib/libclntsh.so.9.0
    4c823000-4c8e5000 rw-p 008fb000 03:05 1518276    /opt/oracle/product/9.2.0/lib/libclntsh.so.9.0
    
    Local Time = Sun Oct 19 22:17:55 2003
    Elapsed Time = 163
    #
    # The exception above was detected in native code outside the VM
    #
    # Java VM: Java HotSpot(TM) Client VM (1.3.1_02-b02 mixed mode)
    #
    # An error report file has been saved as hs_err_pid6574.log.
    # Please refer to the file for further information.
    #
    #
    
    
    To patch the runInstaller in $ORACLE_HOME/bin, run the following commands:
    su - oracle
    cd $ORACLE_HOME/bin
    $ ./runInstaller
     - Welcome Screen:         Click Next
     - Inventory Location:     Click Next
     - File Locations:         - Click "Browse for the Source"
                               - Navigate to the stage directory where the patch set is located.
                                 On my system it is: "/tmp/Disk1/stage"
                               - Select the "products.jar" file.
                               - Click OK
     - Available Products:     Select "Oracle Universal Installer 2.2.0.18.0
     - Components Locations:   Click Next
     - Summary:                Click Install
     - When installation has completed, click Exit.
    

    To patch the Oracle Software to Oracle9iR2 Patch Set 3 9.2.0.4, launch the Installer. These steps only need to be performed on one node, the node you are installing from.
    su - oracle
    cd $ORACLE_HOME/bin
    $ ./runInstaller
     - Welcome Screen:         Click Next
     - Inventory Location:     Click Next
      - Cluster Node Selection:Select/Highlight all RAC nodes using the shift key and the left mouse button.
                               Note: If not all RAC nodes are showing up, or if the Node Selection Screen
                               doesn't appear, then the Oracle Cluster Manager (Node Monitor) oracm is probably not
                               running on all RAC nodes. See Starting and Stopping Oracle 9i Cluster Manager for more information.
     - File Locations:         - Click "Browse for the Source"
                               - Navigate to the stage directory where the patch set is located.
                                 On my system it is: "/tmp/Disk1/stage"
                               - Select the "products.jar" file.
                               - Click OK
     - Available Products:     Select "Oracle9iR2 Patch Set 3 9.2.0.4"
     - Summary:                Click Install
     - When installation has completed, click Exit.
    
    
    Run the root.sh script on all RAC nodes when prompted:


    Creating an Oracle 9i Real Application Cluster (RAC) Database


    Starting Oracle Global Services

    To check if the Oracle Global Services daemon gsd is running, execute the following command:
    su - oracle
    $ gsdctl stat
    GSD is not running on the local node
    $
    To initialize the shared configuration file, run the following command only on one RAC node:
    su - oracle
    $ srvconfig -init

    If you get a PRKR-1064 error, then check if /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile file is accessable by all RAC nodes:
    $ ls -l /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
    -rw-r--r--    1 oracle   oinstall 10565120 Nov 10 17:06 /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
    If you use raw devices, then your raw device for the shared configuration file is probably too small. Increase the size and try again.


    To start the Global Services daemon gsd on all RAC nodes, run the following command on all RAC nodes:
    su - oracle
    $ gsdctl start
    Successfully started GSD on local node
    $

    Creating the Database

    Using OCFS for Database Files and Other Files

    The Database Configuration Assistant will create the Oracle database files automatically on the OCFS filesystem if "dbca -datafileDestination /var/opt/oracle/oradata" is executed.

    Using Raw Devices for Database Files and Other Files

    Optionally, you can create soft links to the raw devices. In my setup I created soft links to /var/opt/oracle/oradata/orcl.
    If you haven't created the soft links yet as shown in
    Creating Partitions for Raw Devices, run now the following commands on all RAC nodes since OCFS is not being used here:
    su - oracle
    
    ln -s /dev/raw/raw3 /var/opt/oracle/oradata/orcl/spfileorcl.ora
    
    ln -s /dev/raw/raw4  /var/opt/oracle/oradata/orcl/control01.ctl
    ln -s /dev/raw/raw5  /var/opt/oracle/oradata/orcl/control02.ctl
    ln -s /dev/raw/raw6  /var/opt/oracle/oradata/orcl/indx01.dbf
    ln -s /dev/raw/raw7  /var/opt/oracle/oradata/orcl/system01.dbf
    ln -s /dev/raw/raw8  /var/opt/oracle/oradata/orcl/temp01.dbf
    ln -s /dev/raw/raw9  /var/opt/oracle/oradata/orcl/tools01.dbf
    ln -s /dev/raw/raw10 /var/opt/oracle/oradata/orcl/undotbs01.dbf
    ln -s /dev/raw/raw11 /var/opt/oracle/oradata/orcl/undotbs02.dbf
    ln -s /dev/raw/raw12 /var/opt/oracle/oradata/orcl/undotbs03.dbf
    ln -s /dev/raw/raw13 /var/opt/oracle/oradata/orcl/users01.dbf
    
    ln -s /dev/raw/raw14 /var/opt/oracle/oradata/orcl/redo01.log
    ln -s /dev/raw/raw15 /var/opt/oracle/oradata/orcl/redo02.log
    ln -s /dev/raw/raw16 /var/opt/oracle/oradata/orcl/redo03.log
    ln -s /dev/raw/raw17 /var/opt/oracle/oradata/orcl/orcl_redo2_2.log
    ln -s /dev/raw/raw18 /var/opt/oracle/oradata/orcl/orcl_redo3_1.log
    ln -s /dev/raw/raw19 /var/opt/oracle/oradata/orcl/orcl_redo3_2.log
    

    Running Oracle Database Configuration Assistant

    Launch the Installer with the following option whether you use OCFS or raw devices (assuming you created soft links to the raw devices):
    su - oracle
    dbca -datafileDestination /var/opt/oracle/oradata
    
     - Type of database:        Select "Oracle Cluster Database"
     - Operation:               Select "Create a Database"
     - Nodes:                   Click "Select All" (rac1pub, rac2pub, rac3pub) and click Next.
                                Note: If not all RAC nodes are showing up, then the Oracle Cluster Manager (Node Monitor) 
                                oracm is probably not running correctly on all RAC nodes. 
                                See Starting and Stopping Oracle 9i Cluster Manager for more information.
     - Database Templates:      Select "New Database"
     - Database Identification: Global Database Name: orcl
                                SID Prefix:           orcl
     - Database Features:       - Clear all boxes and confirm deletion of tablesspaces.
                                - Click "Standard database features"
                                  - Clear all boxes and confirm deletion of tablesspaces.
                                  - Click OK
                                - Click Next
     - Database Connection Options:
                                Select "Dedication Server Mode"
     - Initialization Parameters: 
                                Click Next
     - Database Storage:        - Select Controlfile and delete control03.ctl (highlight and click Backspace)
                                - Click Next
     - Creation Options:        Click Finish
     - Summary:                 Click OK
    
     When prompted to perform another operation, click No.
    
    
    Your RAC cluster should now be up and running:
    su - oracle
    $ srvctl status database -d orcl
    Instance orcl1 is running on node rac1pub
    Instance orcl2 is running on node rac2pub
    Instance orcl3 is running on node rac3pub
    $
    

    Transparent Application Failover (TAF)

    Introduction

    Processes external to the Oracle9i RAC cluster control the Transparent Application Failover (TAF). This means that the failover types and methods can be unique for each Oracle Net client. The re-connection happens automatically within the OCI library which means that you do not need to change the client application to use TAF. Note that a Java thin client won't be reconnected automatically since it doesn't read the tnsnames.ora file.

    Testing Transparent Application Failover (TAF) on the New Installed RAC Cluster

    Setup

    To test TAF on the new installed RAC cluster, configure the tnsnames.ora file for TAF on a non-RAC server where you have either the Oracle database software or the Oracle client software installed.

    Here is an example how my /opt/oracle/product/9.2.0/network/admin/tnsnames.ora: looks like:
    ORCL =
      (DESCRIPTION =
        (ADDRESS_LIST =
          (ADDRESS = (PROTOCOL = TCP)(HOST = rac1pub)(PORT = 1521))
          (ADDRESS = (PROTOCOL = TCP)(HOST = rac2pub)(PORT = 1521))
          (ADDRESS = (PROTOCOL = TCP)(HOST = rac3pub)(PORT = 1521))
          (LOAD_BALANCE = on)
          (FAILOVER = on)
        )
        (CONNECT_DATA =
          (SERVICE_NAME = orcl)
          (FAILOVER_MODE =
            (TYPE = session)
            (METHOD = basic)
          )
        )
      )
    
    
    The following SQL statement can be used to check the sessions's failover type, failover method, and if a failover has occured:
    select instance_name, host_name,
           NULL AS failover_type,
           NULL AS failover_method,
           NULL AS failed_over
        FROM v$instance
      UNION
      SELECT NULL, NULL, failover_type, failover_method, failed_over
        FROM v$session
        WHERE username = 'SYSTEM';
    

    Example of a Transparent Application Failover

    Here is an example of a Transparent Application Failover:
    su - oracle
    $ sqlplus system@orcl
    
    SQL> select instance_name, host_name,
      2         NULL AS failover_type,
      3         NULL AS failover_method,
      4         NULL AS failed_over
      5    FROM v$instance
      6  UNION
      7  SELECT NULL, NULL, failover_type, failover_method, failed_over
      8    FROM v$session
      9    WHERE username = 'SYSTEM';
    
    INSTANCE_NAME    HOST_NAME  FAILOVER_TYPE FAILOVER_M FAI
    ---------------- ---------- ------------- ---------- ---
    orcl1            rac1prv
                                SESSION       BASIC      NO
    
    SQL>
    
    
    Now do a "shutdown abort" on "rac1prv" for instance "orcl1". You can use the srvctl utility to do this:
    su - oracle
    $ srvctl status database -d orcl
    Instance orcl1 is running on node rac1pub
    Instance orcl2 is running on node rac2pub
    Instance orcl3 is running on node rac3pub
    $
    $ srvctl stop instance -d orcl -i orcl1 -o abort
    $
    $ srvctl status database -d orcl
    Instance orcl1 is not running on node rac1pub
    Instance orcl2 is running on node rac2pub
    Instance orcl3 is running on node rac3pub
    $
    
    
    Now rerun the SQL statement:
    SQL> select instance_name, host_name,
      2         NULL AS failover_type,
      3         NULL AS failover_method,
      4         NULL AS failed_over
      5    FROM v$instance
      6  UNION
      7  SELECT NULL, NULL, failover_type, failover_method, failed_over
      8    FROM v$session
      9    WHERE username = 'SYSTEM';
    
    
    INSTANCE_NAME    HOST_NAME  FAILOVER_TYPE FAILOVER_M FAI
    ---------------- ---------- ------------- ---------- ---
    orcl2            rac2prv
                                SESSION       BASIC      YES
    
    SQL>
    
    
    The SQL statement shows that the sessions has now been failed over to instance "orcl2".


    Appendix

    Oracle 9i RAC Problems and Errors

    This section describes problems and errors pertaining to installing Oracle9i RAC on Red Hat Advanced Server.

    There could be many reasons for this problem. If you use raw devices, check if you rebooted the server after you created new raw devices.

    If the Database Configuration Assistant hangs at 91% and no Oracle background processes are running on other RAC nodes, then stop the installation, delete the datbase, and delete all processes that have been started by the Database Configuration Assistant on all RAC nodes. If you are not sure, stop all Oracle processes including the gsd daemons. After that restart gsd on all RAC nodes and run dbca again.

    Check if the Oracle Cluster Manager oracm is running on all RAC nodes. Check also if ORACLE_HOME is set.

    Create the following link before restarting runInstaller:
    su - oracle
    cd $ORACLE_BASE/oui/bin/linux
    ln -s libclntsh.so.9.0 libclntsh.so
    

    The Oracle Cluster Manager oracm is not running.

    References

    Oracle's Linux Center
    Tips and Techniques: Install and Configure Oracle9i on Red Hat Linux Advanced Server
    Oracle9i Real Application Clusters Quick Installation Guide for Linux x86
    Project Documentation: OCFS
    Step-By-Step Installation of 9.2.0.4 RAC on Linux - See Metalink


    Copyright © 2007 PUSCHITZ.COM

    The information provided on this website comes without warranty of any kind and is distributed AS IS. Every effort has been made to provide the information as accurate as possible, but no warranty or fitness is implied. The information may be incomplete, may contain errors or may have become out of date. The use of this information described herein is your responsibility, and to use it in your own environments do so at your own risk.