Learn with Halim

Halim, a Georgia Tech graduate Senior Database Engineer/Data Architect based in Atlanta, USA, is an Oracle OCP DBA and Developer, Certified Cloud Architect Professional, and OCI Autonomous Database Specialist. With extensive expertise in database design, configuration, tuning, capacity planning, RAC, DG, scripting, Python, APEX, and PL/SQL, he combines technical mastery with a passion for innovation. Notably, Halim secured 16th place worldwide in PL/SQL Challenge Cup Playoff on the year 2010.

Friday, September 11, 2009

Setup RAC on a single Linux machine

Setup RAC on a single Linux machine

This document explains how to install an Oracle9i Real Application cluster on a single Linux machine. No shared storage or special hardware is required for this. Raw devices are used to store the Oracle database files.

In contrast to what some people say, it is possible to setup an Oracle cluster configuration without having to dispose of special hardware, i.e. shared storage. A configuration can be setup by creating two instances on one single machine. The only requirement is that raw devices are used for the Oracle database files.

In this document we consider the case of a Linux machine. Of course such a configuration is only intended for testing purposes and for becoming familiar with Oracle clustering. This configuration will give you no advantage in performance or high availabilty.

Prepare Raw Devices

Prepare your kernel to support raw devices. To determine if your kernel source tree is patched for raw devices, check if
/usr/src/linux/drivers/char/raw.c
exists. In my case I used SuSE Linux version 7.2. This is patched for raw devices by default. If you version isn’t, then check the documentation on how to get this set right.

Every database file, controlfile and redolog files must be put on a raw device. For a minimal system you need at least:

Node | Object | raw device | Partition | Mb
--------------------------------------------------------- ----------
node monitor | sync partition | /dev/raw1 |hda8 | 15
fenris1,fenris2 | a controlfile | /dev/raw2 |hda9 | 15
fenris1,fenris2 | a 2nd controlfile | /dev/raw3 |hda10 | 15
fenris1 | rlogfile 1 | /dev/raw4 |hda11 | 100
fenris1 | rlogfile 2 | /dev/raw5 |hda12 | 100
fenris2 | rlogfile 3 | /dev/raw6 |hda13 | 100
fenris2 | rlogfile 4 | /dev/raw7 |hda14 | 100
fenris1,fenris2 | 1 'system.dbf' | /dev/raw8 |hda15 | 256
fenris1,fenris2 | 1 'rollback.dbf' | /dev/raw9 |hda16 | 50

Use the fdisk command to create the partitions on your hard drive. As you may know, the number of primary partitions on a disk is limited to four. So you will need to create these partitions as logical drives in an extended partition (in my example /dev/hda2 refers to the extended partition). This is my partition table (the partitions I use for Oracle start from hda8 on):

Disk /dev/hda: 255 heads, 63 sectors, 2491 cylinders
Units = cylinders of 16065 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hda1 1 637 5116671 7 HPFS/NTFS
/dev/hda2 638 2491 14892255 f Win95 Ext'd (LBA)
/dev/hda5 638 641 32098+ 83 Linux
/dev/hda6 642 1534 7172991 83 Linux
/dev/hda7 1535 1630 771088+ 82 Linux swap
/dev/hda8 1631 1632 16033+ 83 Linux
/dev/hda9 1633 1634 16033+ 83 Linux
/dev/hda10 1635 1636 16033+ 83 Linux
/dev/hda11 1637 1649 104391 83 Linux
/dev/hda12 1650 1662 104391 83 Linux
/dev/hda13 1663 1675 104391 83 Linux
/dev/hda14 1676 1688 104391 83 Linux
/dev/hda15 1689 1720 257008+ 83 Linux
/dev/hda16 1721 1727 56196 83 Linux

Check if the necessary devices exist (/dev/hdaXX), otherwise add more devices using:
mknod /dev/hda17 b 3 17

I took each partition a bit bigger than the filesize I will create on it. E.g. each redolog partition (hda11 .. hda14) is 101Mb. The redolog files will be 100M. This is because of the file header, which takes some extra blocks on top of the filesize you specify.

Now you have to bind these partitions to respective raw devices. This binding is volatile and must be restored after any reboot (I have added the following to my /etc/init.d/boot.local):

/usr/sbin/raw /dev/raw1 /dev/hda8
/usr/sbin/raw /dev/raw2 /dev/hda9
/usr/sbin/raw /dev/raw3 /dev/hda10
/usr/sbin/raw /dev/raw4 /dev/hda11
/usr/sbin/raw /dev/raw5 /dev/hda12
/usr/sbin/raw /dev/raw6 /dev/hda13
/usr/sbin/raw /dev/raw7 /dev/hda14
/usr/sbin/raw /dev/raw8 /dev/hda15
/usr/sbin/raw /dev/raw9 /dev/hda16

/bin/chmod 600 /dev/raw1
/bin/chmod 600 /dev/raw2
/bin/chmod 600 /dev/raw3
/bin/chmod 600 /dev/raw4
/bin/chmod 600 /dev/raw5
/bin/chmod 600 /dev/raw6
/bin/chmod 600 /dev/raw7
/bin/chmod 600 /dev/raw8
/bin/chmod 600 /dev/raw9

/bin/chown oracle /dev/raw1
/bin/chown oracle /dev/raw2
/bin/chown oracle /dev/raw3
/bin/chown oracle /dev/raw4
/bin/chown oracle /dev/raw5
/bin/chown oracle /dev/raw6
/bin/chown oracle /dev/raw7
/bin/chown oracle /dev/raw8
/bin/chown oracle /dev/raw9

insmod softdog soft_margin=60
(The insmod, will be discussed afterwards)

Prepare your kernel for the watchdog device

The watchdog device allows applications to make use of a timer facility.

This is how it works. First the application registers an action routine with the watchdog device, and start the watchdog running. This can be compared with a counter that is counting downwards. After this the application must call a watchdog-reset function at regular intervals. This will bring the counter back to its starting value. If the application doesn’t call the watchdog reset function the counter will reach zero and then the action routine is invoked.

It is assumed that the watchdog timer never reaches zero. Only when there has been a serious fault in either the hardware or the software, the application's action routine should perform an appropriate reset operation.

The main reason for having a watchdog timer is to take action when certain processes are not functioning anymore.

The watchdog device can be implemented by software or by interfacing to a hardware watchdog device.

The watchdogd daemon offers watchdog services to Node Monitor, Cluster Manager, and the LMON (Lock Monitor) process. If watchdogd finds a process that did not notify it during the defined interval, it crashes the entire system. On Linux, a watchdog device is implemented at the kernel level.

You have to verify if your kernel supports the watchdog device.

cd /usr/src/linux
cp configs/kernel-2.2.14-i686.config .config
make xconfig

Goto section "Character devices"
Goto section "Watchdog cards" and set:
'Software Watchdog' = 'm'
'Disable watchdog shutdown on close' = 'Y'

Then rebuild your kernel.

As the watchdog device is now implemented as a module, you have to load this module after reboot. You can do this manually or add the following line to your /etc/init.d/boot.local file:

insmod softdog soft_margin=60

This basically means that, when a process has registered with the watchdog device, and it doesn't send a notification during 60 seconds, the watchdog device will take action (e.g. reboot the system)

You can always check if watchdog is loaded well using:
lsmod
You should then see a line:
softdog 1472 0 (unused)

Install Oracle software

Install the Oracle software as usual. Make sure to choose custom installation, and check the ‘Real Application Cluster’ option in the list of products.

On the page "Cluster Nodes selection”:
The local node will always be selected. Please enter additional nodes.
 Leave all fields empty, i.e. specify no additional nodes, as we have both instances on the same node.

Enter the JDK Home.
I filled in: /usr/local/jdk118_v3
(which I downloaded from www.blackdown.org, file jdk118_v3-glibc-2.1.3.tar.bz2, size 13M)

Install any available patches (e.g. I installed the 9.0.1.2 patch) to obtain the latest Oracle release.

Configuration

Make sure the $ORACLE_HOME/oracm/admin/nmcfg.ora file contains:
DefinedNodes=fenris
CmDiskFile=/dev/raw1
CmHostName=fenris
/dev/raw1 refers to the first raw device (which corresponds with /dev/hda8 in my case). This will contain the cluster quorum.

Make sure the $ORACLE_HOME/oracm/admin/ocmargs.ora file contains:
# Sample configuration file $ORACLE_HOME/oracm/admin/ocmargs.ora
watchdogd -g dba -l 0 -d /dev/null
oranm /c /v
oracm /a:0 /d /v
norestart 1800
With this configuration, which is different from the Oracle default configuration, your server will not reboot when one instance doesn't notify the watchdog anymore (e.g. after a shutdown immediate of one instance). When both instances are running on the same machine, this is quite useful :) However Oracle says that the reboot is intended to not cause corruptions, so I hope that this configuration will never corrupt my database. Remember, after all this is only intended for testing purposes (Thanks to Fernando Soares for this).

Remove any previous logfiles in $ORACLE_HOME/oracm/log.
Make sure the watchdog is not running (ps –ef | grep watchdogd).

Start the watchdog deamon as user root. I use the following script:
export ORACLE_HOME=/u01/oracle/Ora901
rm $ORACLE_HOME/oracm/log/*
$ORACLE_HOME/oracm/bin/ocmstart.sh

Check if the "watchdogd" process is running and you should also see a number of "oracm" and "oranm" processes.

Log in as user oracle again.

As user oracle, start the Global Services Deamon:
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
gsd
--> Successfully started the daemon on the local node.

In my case, when I now try to start dbca, it asks me if I want to create a single or a cluster database. When I select cluster, it crashes with a segmentation violation... After reinstalling all Oracle software, it doesn’t crash anymore. I can go through the database creation wizard, but at the end, I get the error “problem in creating directories on the nodes”.

So fortunately we can do this manually as well:

Add the database with the name "fenris", specify ORACLE_HOME as well:
srvctl add db -p fenris -o /u01/oracle/Ora901
--> Successful addition of cluster database: fenris

Now add one instance for this database:
oracle@fenris > srvctl add instance -p fenris -i fenris1 -n fenris
--> Instance successfully added to node: fenris

And finally add the other instance:
oracle@fenris > srvctl add instance -p fenris -i fenris2 -n fenris
-->Instance successfully added to node: fenris

Modify the /etc/oratab file to reflect the database we are going to create:
fenris1:/u01/oracle/Ora901:N
fenris2:/u01/oracle/Ora901:N

Make the parameter file. Normally each instance has its own parameterfile, but it is possible to work with one file by using the following syntax:
instance.parameter = value

This is my parameter file.
Note the location of the controlfile, it is put on a raw device:

db_name=fenris
db_files = 80
db_file_multiblock_read_count = 8
db_block_buffers = 1000
shared_pool_size = 13500000
log_checkpoint_interval = 10000
processes = 50
parallel_max_servers = 5
log_buffer = 32768
max_dump_file_size = 10240
global_names = TRUE
control_files = (/dev/raw2)

parallel_server = true
parallel_server_instances=2

fenris1.instance_name=fenris1
fenris1.instance_number=1
fenris1.thread=1
fenris2.instance_name=fenris2
fenris2.instance_number=2
fenris2.thread=2

fenris1.rollback_segments = (rbs1_1,rbs1_2)
fenris2.rollback_segments = (rbs2_1,rbs2_2)
fenris1.local_listener = fenris1
fenris2.local_listener = fenris2

fenris1.mts_dispatchers="(protocol=tcp)(listener=fenris2)"
fenris2.mts_dispatchers="(protocol=tcp)(listener=fenris1)"

I called this file $ORACLE_HOME/dbs/initfenris.ora. Then I created two symbolic links for each of my two instances:

ln -s $ORACLE_HOME/dbs/initfenris.ora $ORACLE_HOME/dbs/initfenris1.ora
ln -s $ORACLE_HOME/dbs/initfenris.ora $ORACLE_HOME/dbs/initfenris2.ora

In this way each instance will find the parameter file (init.ora) but I have to maintain only one file.

Now, finally, it is time to create our database:
Set your ORACLE_SID to fenris1:
. oraenv
ORACLE_SID = [dummy] ? fenris1

Sqlplus “sys/change_on_install as sysdba”
Startup nomount

Run the following script:

=================Create RAC db script=====================
spool createdb.log

set echo on
connect sys/fenris1 as sysdba
startup nomount pfile="/u01/oracle/Ora901/dbs/initfenris.ora"
CREATE DATABASE fenris
CONTROLFILE REUSE
MAXLOGMEMBERS 5
MAXLOGHISTORY 100
MAXDATAFILES 254
MAXINSTANCES 32
MAXLOGFILES 64
DATAFILE '/dev/raw8' SIZE 245M REUSE
LOGFILE GROUP 1 ('/dev/raw4') size 100m REUSE,
GROUP 2 ('/dev/raw5') size 100m REUSE
CHARACTER SET WE8ISO8859P1
NATIONAL CHARACTER SET AL16UTF16;

REM **** Create the rollback structures ***************
CREATE TABLESPACE RBS DATAFILE '/dev/raw9'
SIZE 45M REUSE MINIMUM EXTENT 512K;
create rollback segment rbs1_1 storage(initial 200K next 200K)
tablespace RBS;
alter rollback segment rbs1_1 online;
create rollback segment rbs1_2 storage(initial 200K next 200K)
tablespace RBS;
alter rollback segment rbs1_2 online;
create rollback segment rbs2_1 storage(initial 200K next 200K)
tablespace RBS;
alter rollback segment rbs2_1 online;
create rollback segment rbs2_2 storage(initial 200K next 200K)
tablespace RBS;
alter rollback segment rbs2_2 online;

REM **** Various SQL packages ***************
@$ORACLE_HOME/rdbms/admin/catalog.sql;
@$ORACLE_HOME/rdbms/admin/catexp7.sql
@$ORACLE_HOME/rdbms/admin/catproc.sql
@$ORACLE_HOME/rdbms/admin/caths.sql
connect system/manager
@$ORACLE_HOME/dbs/pupbld.sql
REM **** End various SQL packages ***************

pause Press RET for adding redolog files instance 2...
REM **** Redo logfiles for the second instance ***************
connect sys/fenris1 as sysdba
alter database add logfile thread 2
group 3 '/dev/raw6' size 100M reuse,
group 4 '/dev/raw7' size 100M reuse;

pause Press RET for enabling thread 2...
REM **** Enable the new logfile for thread 2
alter database enable public thread 2;

pause Press RET for running catclust.sql
REM **** Cluster Database SQL support ***************
@$ORACLE_HOME/rdbms/admin/catclust.sql

=================End of create RAC db script====================

Voila, that's it, your cluster database is now ready.

Now every time you reboot the machine, you have to make sure the watchdogd and cluster processes are running before you try to start oracle. Remember that you have to do this as user root with the ocmstart.sh script.

Learn with Halim

Friday, September 11, 2009

Setup RAC on a single Linux machine

No comments:

My Blog List