HAIP is up on some nodes but not on all RAC nodes

[racserver1:oracle] /usr/oracle $ crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asm
1 ONLINE ONLINE racserver1 Started
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE racserver1
ora.crsd
1 ONLINE ONLINE racserver1
ora.cssd
1 ONLINE ONLINE racserver1
ora.cssdmonitor
1 ONLINE ONLINE racserver1
ora.ctssd
1 ONLINE ONLINE racserver1 OBSERVER
ora.diskmon
1 ONLINE ONLINE racserver1
ora.drivers.acfs
1 ONLINE ONLINE racserver1
ora.evmd
1 ONLINE ONLINE racserver1
ora.gipcd
1 ONLINE ONLINE racserver1
ora.gpnpd
1 ONLINE ONLINE racserver1
ora.mdnsd
1 ONLINE ONLINE racserver1

[racserver1:oracle] /usr/oracle $ crsctl stat res ora.cluster_interconnect.haip -init
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
TARGET=ONLINE
STATE=OFFLINE

[racserver1:oracle] /usr/oracle $

[racserver2:oracle] /usr/grid_application/grid/diag/asm/+asm/+ASM2/trace $ crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asm
1 ONLINE OFFLINE
ora.cluster_interconnect.haip
1 ONLINE ONLINE racserver2
ora.crf
1 ONLINE ONLINE racserver2
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE racserver2
ora.cssdmonitor
1 ONLINE ONLINE racserver2
ora.ctssd
1 ONLINE ONLINE racserver2 OBSERVER
ora.diskmon
1 ONLINE ONLINE racserver2
ora.drivers.acfs
1 ONLINE ONLINE racserver2
ora.evmd
1 ONLINE INTERMEDIATE racserver2
ora.gipcd
1 ONLINE ONLINE racserver2
ora.gpnpd
1 ONLINE ONLINE racserver2
ora.mdnsd
1 ONLINE ONLINE racserver2
[racserver2:oracle] /usr/grid_application/grid/diag/asm/+asm/+ASM2/trace $ crsctl stat res ora.cluster_interconnect.haip -init
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
TARGET=ONLINE
STATE=ONLINE on racserver2

[racserver2:oracle] /usr/grid_application/grid/diag/asm/+asm/+ASM2/trace $

ASM alert log from both nodes showing cluster communicated on different interface :

Cluster communication is configured to use the following interface(s) for this instance
10.250.48.28
cluster interconnect IPC version:Oracle UDP/IP (generic)

Cluster communication is configured to use the following interface(s) for this instance
169.254.89.38
cluster interconnect IPC version:Oracle UDP/IP (generic)

Per note : Doc ID 1383737.1

Case3: HAIP is up on some nodes but not on all
Symptoms:
• alert_.log for some instances
Cluster communication is configured to use the following interface(s) for this instance
10.1.0.1
• alert_.log for other instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.201.65

Note: some instances is using HAIP while others are not, so they can not talk to each other
Solution:

The solution is to bring up HAIP on all nodes.

To find out HAIP status, execute the following on all nodes:
$GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init

If it’s offline, try to bring it up as root:
$GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip -init

Node 1 :

[root@racserver1 ~]# /usr/grid_application/oracle/product/11.2/grid/bin/crsctl start res ora.cluster_interconnect.haip -init
CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘racserver1’
CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘racserver1’ succeeded
[root@racserver1 ~]# /usr/grid_application/oracle/product/11.2/grid/bin/crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asm
1 ONLINE ONLINE racserver1 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE racserver1

Node 2 :

[root@racserver2 ~]# /usr/grid_application/oracle/product/11.2/grid/bin/crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
[root@racserver2 ~]# /usr/grid_application/oracle/product/11.2/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.crf’ on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘racserver2’
CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘racserver2’ succeeded
CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘racserver2’ succeeded
CRS-2677: Stop of ‘ora.crf’ on ‘racserver2’ succeeded
CRS-2677: Stop of ‘ora.mdnsd’ on ‘racserver2’ succeeded
CRS-2677: Stop of ‘ora.evmd’ on ‘racserver2’ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘racserver2’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘racserver2’
CRS-2677: Stop of ‘ora.cssd’ on ‘racserver2’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘racserver2’
CRS-2673: Attempting to stop ‘ora.diskmon’ on ‘racserver2’
CRS-2677: Stop of ‘ora.diskmon’ on ‘racserver2’ succeeded
CRS-2677: Stop of ‘ora.gipcd’ on ‘racserver2’ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘racserver2’
CRS-2677: Stop of ‘ora.gpnpd’ on ‘racserver2’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘racserver2’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@racserver2 ~]# /usr/grid_application/oracle/product/11.2/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@racserver2 ~]#

[racserver2:oracle] /usr/oracle $ crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asm
1 ONLINE ONLINE racserver2 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE racserver2

Sample CRS and ASM logs in problematic node2 as follows :-

CRS alert log :

2014-06-07 15:27:31.183
[/usr/grid_application/oracle/product/11.2/grid/bin/oraagent.bin(11139)]CRS-5019:All OCR locations are on ASM disk groups [application_GI_DG1], and none of these disk groups are mounted. Details are at “(:CLSN00100:)” in “/usr/grid_application/oracle/product/11.2/grid/log/applicationdb2/agent/ohasd/oraagent_oracle/oraagent_oracle.log”.
2014-06-07 15:27:31.183
[/usr/grid_application/oracle/product/11.2/grid/bin/oraagent.bin(11139)]CRS-5011:Check of resource “+ASM” failed: details at “(:CLSN00006:)” in “/usr/grid_application/oracle/product/11.2/grid/log/applicationdb2/agent/ohasd/oraagent_oracle/oraagent_oracle.log”

ASM alert log :

Sat Jun 07 15:27:30 2014
PMON (ospid: 25046): terminating the instance due to error 481
Sat Jun 07 15:27:30 2014
System state dump requested by (instance=2, osid=25046 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /usr/grid_application/grid/diag/asm/+asm/+ASM2/trace/+ASM2_diag_25096.trc
Dumping diagnostic data in directory=[cdmp_20140607152730], requested by (instance=2, osid=25046 (PMON)), summary=[abnormal instance termination].

Tagged Under : , ,

Make a Comment