This Document shows the step by step of upgrading 3-Node 11gR1 RAC to 11gR2 RAC. I have chosen the below upgrade path which will allow me to upgrade 11gR1 Clusterware and ASM to 11gR2 Grid infrastructure. I would prefer to perform this activity in a scheduled outage window though the rolling upgrade of ASM and CRS is possible in 11gR1. Upgrading 11gR1 RAC database needs outage and the total downtime may further be avoided or minimized by using Standby Database in the upgrade process (not covered here).
Existing 10gR2 RAC setup (Before Upgrade)
|
Target 11gR2 RAC Setup (After Upgrade)
| |
Clusterware
|
Oracle 11g R1 Clusterware 11.1.0.6
|
Oracle 11gR2 Grid Infrastructure 11.2.0.1
|
ASM Binaries
|
11g R1 RAC 11.1.0.6
|
Oracle 11gR2 Grid Infrastructure 11.2.0.1
|
Cluster Name
|
Lab
|
lab
|
Cluster Nodes
|
node1, node2, node3
|
node1, node2, node3
|
Clusterware Home
|
/u01/app/oracle/crs (CRS_HOME)
|
/u01/app/grid11201 (GRID_HOME)
|
Clusterware Owner
|
oracle:(oinstall, dba)
|
oracle:(oinstall, dba)
|
VIPs
|
node1-vip, node2-vip, node3-vip
|
node1-vip, node2-vip, node3-vip
|
SCAN
|
N/A
|
lab-scan.hingu.net
|
SCAN_LISTENER Host/port
|
N/A
|
Scan VIPs Endpoint: (TCP:1525)
|
OCR and Voting Disks Storage Type
|
Raw Devices
|
Raw Devices
|
OCR Disks
|
/dev/raw/raw1, /dev/raw/raw2
|
/dev/raw/raw1, /dev/raw/raw2
|
Voting Disks
|
/dev/raw/raw3, /dev/raw/raw4, /dev/raw/raw5
|
/dev/raw/raw3, /dev/raw/raw4, /dev/raw/raw5
|
ASM_HOME
|
/u01/app/oracle/asm11gr1
|
/u01/app/grid11201
|
ASM_HOME Owner
|
oracle:(oinstall, dba)
|
oracle:(oinstall, dba)
|
ASMLib user:group
|
oracle:oinstall
|
oracle:oinstall
|
ASM LISTENER
|
LISTENER (TCP:1521)
|
LISTENER (TCP:1521)
|
DB Binaries
|
Oracle 11gR1 RAC (11.1.0.6)
|
Oracle 11gR2 RAC (11.2.0.1)
|
DB_HOME
|
/u01/app/oracle/db11gr1
|
/u01/app/oracle/db11201
|
DB_HOME Owner
|
oracle:(oinstall, dba)
|
oracle:(oinstall, dba)
|
DB LISTENER
|
LAB_LISTENER
|
LAB_LISTENER
|
DB Listener Host/port
|
node1-vip, node2-vip, node3-vip (port 1530)
|
node1-vip, node2-vip, node3-vip (port 1530)
|
DB Storage Type, File Management
|
ASM with OMFs
|
ASM with OMFs
|
ASM diskgroups for DB and FRA
|
DATA, FRA
|
DATA, FRA
|
OS Platform
|
Oracle Enterprise Linux 5.5 (32 bit)
|
Oracle Enterprise Linux 5.5 (32 bit)
|
NOTE: The Grid Infrastructure owner must be the same as the 11gR1 CRS owner. Role Separation is not possible in upgrades.
HERE’s an existing 11gR1 RAC Setup in detail
The Upgrade Process is composed of below 5 Stages:
· Pre-Upgrade Tasks
Minimum Required RPMs for 11gR2 RAC on OEL 5.5 (All the 3 RAC Nodes):
Below command verifies whether the specified rpms are installed or not. Any missing rpms can be installed from the OEL Media Pack
For 11gR2:
rpm -q binutils compat-libstdc++-33 elfutils-libelf elfutils-libelf-devel elfutils-libelf-devel-static \
gcc gcc-c++ glibc glibc-common glibc-devel glibc-headers kernel-headers ksh libaio libaio-devel \
libgcc libgomp libstdc++ libstdc++-devel make numactl-devel sysstat unixODBC unixODBC-devel
Combined both the release’s requirements, I had to install below RPM.
numactl-devel à Located on the 3rd CD of OEL 5.5 Media pack.
[root@node1 ~]# rpm -ivh numactl-devel-0.9.8-11.el5.i386.rpm
warning: numactl-devel-0.9.8-11.el5.i386.rpm: Header V3 DSA signature: NOKEY, key ID 1e5e0159
Preparing... ########################################### [100%]
1:numactl-devel ########################################### [100%]
[root@node1 ~]#
I had to upgrade the cvuqdisk RPM by removing and installing the same with higher version. This step is also taken care by rootupgrade.sh script.
cvuqdisk à Available on Grid Infrastructure Media (under rpm folder)
rpm -e cvuqdisk
export CVUQDISK_GRP=oinstall
echo $CVUQDISK_GRP
rpm -ivh cvuqdisk-1.0.7-1.rpm
SCAN VIPS to configure in DNS which resolves to lab-scan.hingu.net:
192.168.2.151
192.168.2.152
192.168.2.153
HERE is the existing DNS setup. In that setup, the below two files were modified with the entry in RED to add these SCAN VIPs into the DNS.
/var/named/chroot/var/named/hingu.net.zone
/var/named/chroot/var/named/2.168.192.in-addr.arpa.zone
/var/named/chroot/var/named/hingu.net.zone
$TTL 1d
hingu.net. IN SOA lab-dns.hingu.net. root.hingu.net. (
100 ; se = serial number
8h ; ref = refresh
5m ; ret = update retry
3w ; ex = expiry
3h ; min = minimum
)
IN NS lab-dns.hingu.net.
; DNS server
lab-dns IN A 192.168.2.200
; RAC Nodes Public name
node1 IN A 192.168.2.1
node2 IN A 192.168.2.2
node3 IN A 192.168.2.3
; RAC Nodes Public VIPs
node1-vip IN A 192.168.2.51
node2-vip IN A 192.168.2.52
node3-vip IN A 192.168.2.53
; 3 SCAN VIPs
lab-scan IN A 192.168.2.151
lab-scan IN A 192.168.2.152
lab-scan IN A 192.168.2.153
; Storage Network
nas-server IN A 192.168.1.101
node1-nas IN A 192.168.1.1
node2-nas IN A 192.168.1.2
node3-nas IN A 192.168.1.3
/var/named/chroot/var/named/2.168.192.in-addr.arpa.zone
$TTL 1d
@ IN SOA lab-dns.hingu.net. root.hingu.net. (
100 ; se = serial number
8h ; ref = refresh
5m ; ret = update retry
3w ; ex = expiry
3h ; min = minimum
)
IN NS lab-dns.hingu.net.
; DNS machine name in reverse
200 IN PTR lab-dns.hingu.net.
; RAC Nodes Public Name in Reverse
1 IN PTR node1.hingu.net.
2 IN PTR node2.hingu.net.
3 IN PTR node3.hingu.net.
; RAC Nodes Public VIPs in Reverse
51 IN PTR node1-vip.hingu.net.
52 IN PTR node2-vip.hingu.net.
53 IN PTR node3-vip.hingu.net.
; RAC Nodes SCAN VIPs in Reverse
151 IN PTR lab-scan.hingu.net.
152 IN PTR lab-scan.hingu.net.
153 IN PTR lab-scan.hingu.net.
Restart the DNS Service (named):
service named restart
NOTE: nslookup for lab-scan should return names in random order every time.
Oracle Time Synchronization Service is chosen to be used over the Linux system provided ntpd. So, ntpd needs to be deactivated and deinstalled to avoid any possibility of it being conflicted with the Oracle’s Cluster Time Sync Service (ctss).
# /sbin/service ntpd stop
# chkconfig ntpd off
# mv /etc/ntp.conf /etc/ntp.conf.org
Also remove the following file:
/var/run/ntpd.pid
The Network Service Cache Daemon was started on all the RAC nodes.
Service nscd start
Steps I followed to take the Backup of ORACLE_HOMEs before the upgrade: (This can be applied to 11gR1 and 10g HOMEs)
On node1:
mkdir backup
cd backup
dd if=/dev/dev/raw1 of=ocr_disk_10gr2.bkp
dd if=/dev/dev/raw3 of=voting_disk_10gr2.bkp
tar cvf node1_crs_10gr2.tar /u01/app/oracle/crs/*
tar cvf node1_asm_10gr2.tar /u01/app/oracle/asm/*
tar cvf node1_db_10gr2.tar /u01/app/oracle/db/*
tar cvf node1_etc_oracle /etc/oracle/*
cp /etc/inittab etc_inittab
mkdir etc_init_d
cd etc_init_d
cp /etc/init.d/init* .
On node2:
mkdir backup
cd backup
tar cvf node2_crs_10gr2.tar /u01/app/oracle/crs/*
tar cvf node2_asm_10gr2.tar /u01/app/oracle/asm/*
tar cvf node2_db_10gr2.tar /u01/app/oracle/db/*
tar cvf node2_etc_oracle /etc/oracle/*
cp /etc/inittab etc_inittab
mkdir etc_init_d
cd etc_init_d
cp /etc/init.d/init* .
On node3:
mkdir backup
cd backup
tar cvf node3_crs_10gr2.tar /u01/app/oracle/crs/*
tar cvf node3_asm_10gr2.tar /u01/app/oracle/asm/*
tar cvf node3_db_10gr2.tar /u01/app/oracle/db/*
tar cvf node3_etc_oracle /etc/oracle/*
cp /etc/inittab etc_inittab
mkdir etc_init_d
cd etc_init_d
cp /etc/init.d/init* .
RMAN full database backup was taken.
With this, the pre-Upgrade steps are completed successfully and are ready to upgrade to 11g R2 Grid Infrastructure Next.
Step By Step: Upgrade Clusterware, ASM and Database from 11.1.0.6 to 11.2.0.1.
Oracle document recommends leaving all the RAC instances up and running during the upgrade process because the rootupgrade.sh script brings down the crs stack. I would prefer to atleast shutdown the database cleanly before the start of upgrade process.
· Stop the labdb database.
· Start the runInstaller from the 11gR2 Grid Infrastructure software stage.
Grid Infrastructure Upgrade process:
Installation Option:
Upgrade Grid Infrastructure
Product Language:
English
Node Selection:
Select all the nodes
SCAN information:
SCAN name: lab-scan.hingu.net
SCAN port: 1525
ASM Monitor Password
Password entered
Prerequisite Checks:
Verify all the minimum prerequisites are satisfied successfully
Privileged Operating System Groups:
ASM Database Administrator (OSDBA) Group: dba
ASM Instance Administrator Operator (OSOPER) Group: dba
ASM Instance Administrator (OSASM) Group: oinstall
Installation Location:
Oracle Base: /u01/app/oracle
Software Location: /u01/app/grid11201
Summary Screen:
Verified the information here and pressed “Finish” to start installation.
At the end of the installation, the rootupgrade.sh script needs to be executed as root user on all the nodes one by one.
/u01/app/grid11201/rootupgrade.sh
The rootupgrade.sh failed on the last node (node3) with the below error as it seemed that CRS died after successful upgrade of OCR.
The alertnode3.log showed the below error. It seemed that both the OCR disks which are raw devices became inaccessible after the successful upgrade of OCR. I tried this upgrade 2-3 times and it error out at the same exact place all the time. Because the OCR was upgraded successfully, I thought to reboot all the nodes at this stage to see if the HA comes back up successfully after the reboots. I also wanted to confirm that the OCR integrity via ocrcheck to see if there are no logical corruption at the block level.
/u01/app/grid11201/log/node3/alertnode3.log:
[ctssd(22505)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2011-10-13 15:30:58.341
[ohasd(21071)]CRS-2765:Resource 'ora.crsd' has failed on server 'node3'.
2011-10-13 15:30:58.830
[client(25091)]CRS-1006:The OCR location /dev/raw/raw2 is inaccessible. Details in /u01/app/grid11201/log/node3/client/ocrconfig_25091.log.
2011-10-13 15:30:58.845
[client(25091)]CRS-1006:The OCR location /dev/raw/raw1 is inaccessible. Details in /u01/app/grid11201/log/node3/client/ocrconfig_25091.log.
2011-10-13 15:33:44.000
[crsd(25138)]CRS-1012:The OCR service started on node node3.
2011-10-13 15:36:45.355
/u01/app/grid11201/log/node3/client/ocrconfig_25091.log:
Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All rights reserved.
2011-10-13 15:27:59.695: [ OCRCONF][3047016128]ocrconfig starts...
2011-10-13 15:27:59.722: [ OCRCONF][3047016128]Exporting OCR data to [/u01/app/grid11201/cdata/lab/ocr11.2.0.1.0_upg_node3.ocr]
2011-10-13 15:30:58.830: [ OCRRAW][3047016128]proprior: Header check from OCR device 1 offset 0 failed (26).
2011-10-13 15:30:58.845: [ OCRRAW][3047016128]proprior: Header check from OCR device 0 offset 0 failed (22).
2011-10-13 15:30:58.845: [ OCRRAW][3047016128]ibctx: Failed to read the whole bootblock. Assumes invalid format.
2011-10-13 15:30:58.845: [ OCRRAW][3047016128]rtnode:2: Problem [26] reading the tnode 553. Returning [123]
2011-10-13 15:30:58.846: [ OCRRAW][3047016128]prgval: problem reading the tnode
2011-10-13 15:30:58.846: [ OCRCONF][3047016128]Error[104]: Failed to get key value for key CRS.CUR.ora!node2!ons.USR_ORA_PRECONNECT
2011-10-13 15:30:58.847: [ OCRCONF][3047016128]Exiting [status=failed]...
I rebooted all the RAC nodes at this point and after the reboot the HA stack came back up successfully using the new 11gR2 Grid infrastructure but it did not have all the resources configured. I had to manually configure the SCAN, SCAN_LISTENER, OC4J and ACFS as shown. The database, DB services and GSD were down when the 11gR2 CRS came back up on all the nodes after the reboot. It was expected for GSD to remain down as it is disabled by default in 11gR2. I noticed that the srvctl was no longer working to start the db service oltp. I had to use crs_start from Grid Home and it worked fine.
Manual Tasks that were performed to complete the 11gR2 Grid Infrastructure configuration:
As oracle:
/u01/app/grid11201/bin/srvctl enable nodeapps –g
/u01/app/grid11201/bin/srvctl start nodeapps –n node1
/u01/app/grid11201/bin/srvctl start nodeapps –n node1
/u01/app/grid11201/bin/srvctl start nodeapps –n node1
/u01/app/oracle/db11gr1/bin/srvctl start database –d labdb
/u01/app/grid11201/bin/crs_start ora.labdb.oltp.labdb1.srv
/u01/app/grid11201/bin/crs_start ora.labdb.oltp.labdb2.srv
/u01/app/grid11201/bin/crs_start ora.labdb.oltp.labdb3.srv
As root:
/u01/app/grid11201/bin/srvctl add scan -n lab-scan.hingu.net
/u01/app/grid11201/bin/crsctl add type ora.registry.acfs.type -basetype ora.local_resource.type -file /u01/app/grid11201/crs/template/registry.acfs.type
/u01/app/grid11201/bin/crsctl add resource ora.registry.acfs -type ora.registry.acfs.type
As oracle:
/u01/app/grid11201/bin/srvctl add scan_listener –l listener -s -p TCP:1525
/u01/app/grid11201/bin/srvctl start scan
/u01/app/grid11201/bin/srvctl start scan_listener
/u01/app/grid11201/bin/srvctl add oc4j
/u01/app/grid11201/bin/srvctl start oc4j
/u01/app/grid11201/bin/crs_start
ora.registry.acfs
As root: (verify the OCR integrity and logical corruption after the upgrade)
/u01/app/grid11201/bin/ocrcheck
/u01/app/grid11201/bin/crsctl query css votedisk
After configuring the CRS resources manually, the final CRS stack looked like below:
The OCR integrity Check and Logical Corruption Check was verified. Both the Disks looked fine.
· Stopped the labdb database.
· Invoked the asmca from the 11gR2 Grid Infrastructure HOME (/u01/app/grid11201).
· Moved the listener LISTENER from 11gR1 ASM_HOME
· Started the labdb database using 11gR1 srvctl
· Started the DB service oltp using /u01/app/grid11201/bin/crs_start
/u01/app/oracle/db11gr1/bin/srvctl stop database –d labdb
/u01/app/grid11201/bin/asmca
Move the Listener “LISTENER” from 11gR1 ASM Home to 11gR2 Grid Infrastructure:
/u01/app/oracle/db11gr1/bin/srvctl stop listener -l LISTENER_NODE1 -n node1
/u01/app/oracle/db11gr1/bin/srvctl stop listener -l LISTENER_NODE2 -n node2
/u01/app/oracle/db11gr1/bin/srvctl stop listener -l LISTENER_NODE3 -n node3
/u01/app/oracle/db11gr1/bin/srvctl remove listener -l LISTENER_NODE1 -n node1
/u01/app/oracle/db11gr1/bin/srvctl remove listener -l LISTENER_NODE2 -n node2
/u01/app/oracle/db11gr1/bin/srvctl remove listener -l LISTENER_NODE3 -n node3
Add the listener “LISTENER” using netca from 11gR2 Grid Infrastructure Home (TCP:1521)
/u01/app/grid11201/bin/netca
/u01/app/oracle/db11gr1/bin/srvctl start database –d labdb
/u01/app/grid11201/bin/crs_start ora.labdb.oltp.labdb1.srv
/u01/app/grid11201/bin/crs_start ora.labdb.oltp.labdb2.srv
/u01/app/grid11201/bin/crs_start ora.labdb.oltp.labdb3.srv
Start the runInstaller from 11g R2 Real Application Cluster (RAC) Software Location:
/home/oracle/db11201/database/runInstaller
Real Application Cluster installation process:
Configure Security Updates:
Email: bhavin@oracledba.org
Ignore the “Connection Failed” alert.
Installation Option:
Install database software only
Node Selection:
Select All the Nodes (node1,node2 and node3)
Product Language:
English
Database Edition:
Enterprise Edition
Installation Location:
Oracle Base: /u01/app/oracle
Software Location: /u01/app/oracle/db11201
Operating System Groups:
Database Administrator (OSDBA) Group: dba
Database Operator (OSOPER) Group: oinstall
Summary Screen:
Verified the information here and pressed “Finish” to start installation.
At the End of the installation, the below scripts needs to be executed on all the nodes as root user.
/u01/app/oracle/db11201/root.sh
Upgrade the Database labdb using dbua:
· Invoked the dbua from the 11gR2 RAC HOME (/u01/app/oracle/db11201).
· Fixed any Critical Warnings returned from pre-Upgrade Utility by DBUA.
· After the Successful Upgrade of Database to 11.2.0.1, moved the Listener LAB_LISTENER to 11gR2 HOME
· Updated the REMOTE_LISTENER parameter to lab-scan.hingu.net:1525
· Stopped the database labdb
· Rebooted all the nodes and verify that asm, database, listeners and other resources came back up without any issue.
/u01/app/oracle/db11201/bin/dbua
The upgrade of 11gR1 RAC database labdb finished without any error and here is the upgrade result.
Move the Listener “LAB_LISTENER” from 11gR1 RAC DB Home to 11gR2 RAC database Home:
Move the TNSNAMES.ORA from old 11gR1 HOME to 11gR2 Home.
ssh node3 cp /u01/app/oracle/db11gr1/network/admin/tnsnames.ora /u01/app/oracle/db11201/network/admin/
ssh node2 cp /u01/app/oracle/db11gr1/network/admin/tnsnames.ora /u01/app/oracle/db11201/network/admin/
ssh node1 cp /u01/app/oracle/db11gr1/network/admin/tnsnames.ora /u01/app/oracle/db11201/network/admin/
Invoke netca from 11gR1 HOME to remove listener LAB_LISTNEER
/u01/app/oracle/db11gr1/bin/netca
Invoke netca from 11gR2 HOME to add listener LAB_LISTNEER on the same port 1530
/u01/app/oracle/db11201/bin/netca
Select the same end point TCP:1530.
Modified the REMOTE_LISTENER parameter:
alter system set remote_listener='lab-scan.hingu.net' scope=both sid='*';
Restarted the database to verify that the database instances are appropriately registered with their respective listeners.
srvctl stop database -d labdb
srvctl start database -d labdb
Rebooted all the 3 RAC nodes and verified that all the resources comes up without any issue/errors.
reboot
No comments:
Post a Comment