This is a discussion on HACMP failover service IP issue within the AIX Operating System forums, part of the Unix Operating Systems category; --> We had an unplanned (ha!) HACMP event yesterday. For whatever reason, the secondary node couldn't activate the resource group ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| We had an unplanned (ha!) HACMP event yesterday. For whatever reason, the secondary node couldn't activate the resource group (read: I believe the admin who configured HACMP didn't set it up right). At any rate, I restarted the primary node, restarted Cluster services and expected it to swap the service IP onto the boot adapter. Unfortunately, it didn't. As this is a mission critical (man, I hate that term) app, I ifconfig'd the interface back to the service IP so the users could get back in. Of course, errpt is showing 'local adapter misconfiguration.' This morning, I took the secondary node out of the cluster, manually shutdown all of the applications on the primary node, then rebooted the primary node, thinking that HACMP would do the boot->svc IP swap on reboot. Not so. I tried synchronizing cluster resources, activating resource group, etc. (with and without the secondary node in the cluster) all to no avail. I wound up ifconfig'ing the interface again so the users could do their work. Any thoughts on why HACMP would not automagically swap the boot and svc IPs? I'm definitely not HACMP certified, but from what I've read it should acquire the resource and setup the topology. Am I missing something? |
| |||
| "A. Gordon Lyph" <a_g_lyph@hotmail.com> schrieb im Newsbeitrag news:f8479612.0406290331.239b535c@posting.google.c om... > We had an unplanned (ha!) HACMP event yesterday. For whatever reason, > the secondary node couldn't activate the resource group (read: I > believe the admin who configured HACMP didn't set it up right). At > any rate, I restarted the primary node, restarted Cluster services and > expected it to swap the service IP onto the boot adapter. > Unfortunately, it didn't. As this is a mission critical (man, I hate > that term) app, I ifconfig'd the interface back to the service IP so > the users could get back in. Of course, errpt is showing 'local > adapter misconfiguration.' > > This morning, I took the secondary node out of the cluster, manually > shutdown all of the applications on the primary node, then rebooted > the primary node, thinking that HACMP would do the boot->svc IP swap > on reboot. Not so. I tried synchronizing cluster resources, > activating resource group, etc. (with and without the secondary node > in the cluster) all to no avail. I wound up ifconfig'ing the > interface again so the users could do their work. > > Any thoughts on why HACMP would not automagically swap the boot and > svc IPs? I'm definitely not HACMP certified, but from what I've read > it should acquire the resource and setup the topology. Am I missing > something? Post the output of cllsif. |
| |||
| Hi, Please also tell us the version of HACMP and AIX, oslevel -r, as well as which the box / model type on which you are running the HACMP. Also here are the steps you should start with 1. Check the cluster log file i.e. hacmp.out, cluster.log, clstrmgr.debug 2. Check the cluster processes i.e. lssrc -g cluster 3. check the services i.e. lssrc -g grpsvcs and lssrc -g emsvcs Also as describe earlier we need the output of these two command #/usr/es/sbin/cluster/utilities/cllsif #/usr/es/sbin/cluster/utilities/cllsnode Riz. "Andreas Schulze" <b79xan@gmx.de> wrote in message news:2kd6o1Fv9a6U1@uni-berlin.de... > "A. Gordon Lyph" <a_g_lyph@hotmail.com> schrieb im Newsbeitrag > news:f8479612.0406290331.239b535c@posting.google.c om... > > We had an unplanned (ha!) HACMP event yesterday. For whatever reason, > > the secondary node couldn't activate the resource group (read: I > > believe the admin who configured HACMP didn't set it up right). At > > any rate, I restarted the primary node, restarted Cluster services and > > expected it to swap the service IP onto the boot adapter. > > Unfortunately, it didn't. As this is a mission critical (man, I hate > > that term) app, I ifconfig'd the interface back to the service IP so > > the users could get back in. Of course, errpt is showing 'local > > adapter misconfiguration.' > > > > This morning, I took the secondary node out of the cluster, manually > > shutdown all of the applications on the primary node, then rebooted > > the primary node, thinking that HACMP would do the boot->svc IP swap > > on reboot. Not so. I tried synchronizing cluster resources, > > activating resource group, etc. (with and without the secondary node > > in the cluster) all to no avail. I wound up ifconfig'ing the > > interface again so the users could do their work. > > > > Any thoughts on why HACMP would not automagically swap the boot and > > svc IPs? I'm definitely not HACMP certified, but from what I've read > > it should acquire the resource and setup the topology. Am I missing > > something? > > Post the output of cllsif. > > |
| |||
| "Rizwan Abbasi" <abbasi2@attglobal.net> wrote in message news:<40e25458_3@news2.prserv.net>... > Please also tell us the version of HACMP and AIX, oslevel -r, as well as > which the box / model type on which you are running the HACMP. Any help is greatly appreciated. I'm not terribly confident in this box, given the goings on. Here we go... oslevel -r: 5100-03 lslpp -l | grep -i hacmp: cluster.msg.en_US.cspoc 4.5.0.2 COMMITTED HACMP CSPOC Messages - U.S. rsct.basic.hacmp 2.2.1.30 COMMITTED RSCT Basic Function (HACMP/ES rsct.compat.basic.hacmp 2.2.1.30 COMMITTED RSCT Event Management Basic Function (HACMP/ES Support) rsct.compat.clients.hacmp Function (HACMP/ES Support) > 1. Check the cluster log file i.e. hacmp.out, cluster.log, clstrmgr.debug hacmp.out is empty, not sure what I'm supposed to be looking for in cluster.log. clstrmgr.debug has been overwritten and rotated such that the log containing the outage on Monday is gone. > 2. Check the cluster processes i.e. lssrc -g cluster Subsystem Group PID Status clstrmgrES cluster 20148 active clsmuxpdES cluster 20666 active clinfoES cluster 21160 active > 3. check the services i.e. lssrc -g grpsvcs and lssrc -g emsvcs lssrc -g grpsvcs: Subsystem Group PID Status grpsvcs grpsvcs 18112 active grpglsm grpsvcs inoperative (hrmm, grpglsm inop?) lssrc -g emsvcs: Subsystem Group PID Status emsvcs emsvcs 19872 active emaixos emsvcs 15918 active > Also as describe earlier we need the output of these two command > #/usr/es/sbin/cluster/utilities/cllsif (forgive the formatting, please) Adapter Type Network Net Type Attribute Node IP Address Hardware Address Interfa ce Name Global Name Netmask lawpriboot boot ether1 ether public lawpri 10.5 ..53.19 en2 255.255.0.0 lawprisvc service ether1 ether public lawpri 10.5 ..31.14 255.255.0.0 lawpristby standby ether1 ether public lawpri 10.9 ..53.19 en1 255.255.0.0 lawpri-tty1 service rs232a rs232 serial lawpri /dev /tty1 lawsecsvc service ether1 ether public lawsec 10.5 ..31.15 en2 255.255.0.0 lawsecstby standby ether1 ether public lawsec 10.9 ..53.20 en1 255.255.0.0 lawsec-tty1 service rs232a rs232 serial lawsec /dev /tty1 > #/usr/es/sbin/cluster/utilities/cllsnode NODE lawpri: Interfaces to network ether1 boot Interface: Name lawpriboot, Attribute public, IP address 1 0.5.53.19 service Interface: Name lawprisvc, Attribute public, IP address 10.5.31.14 standby Interface: Name lawpristby, Attribute public, IP addres s 10.9.53.19 Interfaces to network rs232a service Interface: Name lawpri-tty1, Attribute serial, IP addre ss /dev/tty1 NODE lawsec: Interfaces to network ether1 service Interface: Name lawsecsvc, Attribute public, IP address 10.5.31.15 standby Interface: Name lawsecstby, Attribute public, IP addres s 10.9.53.20 Interfaces to network rs232a service Interface: Name lawsec-tty1, Attribute serial, IP addre ss /dev/tty1 |
| |||
| A. Gordon Lyph wrote: > > hacmp.out is empty, not sure what I'm supposed to be looking for in > cluster.log. clstrmgr.debug has been overwritten and rotated such > that the log containing the outage on Monday is gone. > WHat the guy meant was the hacmp.out with content for that day. If you "ls -alt /tmp/hacmp.out*" you will see that there are /tmp/hacmp.out.[1-7]. |
| ||||
| a_g_lyph@hotmail.com (A. Gordon Lyph) wrote in message news:<f8479612.0406290331.239b535c@posting.google. com>... > We had an unplanned (ha!) HACMP event yesterday. For whatever reason, > the secondary node couldn't activate the resource group (read: I > believe the admin who configured HACMP didn't set it up right). At > any rate, I restarted the primary node, restarted Cluster services and > expected it to swap the service IP onto the boot adapter. > Unfortunately, it didn't. As this is a mission critical (man, I hate > that term) app, I ifconfig'd the interface back to the service IP so > the users could get back in. Of course, errpt is showing 'local > adapter misconfiguration.' > > This morning, I took the secondary node out of the cluster, manually > shutdown all of the applications on the primary node, then rebooted > the primary node, thinking that HACMP would do the boot->svc IP swap > on reboot. Not so. I tried synchronizing cluster resources, > activating resource group, etc. (with and without the secondary node > in the cluster) all to no avail. I wound up ifconfig'ing the > interface again so the users could do their work. > > Any thoughts on why HACMP would not automagically swap the boot and > svc IPs? I'm definitely not HACMP certified, but from what I've read > it should acquire the resource and setup the topology. Am I missing > something? I will try to help if I can. First I need to make sure I understand the steps you took. You say "took the secondary node out of the cluster". What does that mean? If you rebooted both nodes, all interfaces should have the "boot" address active (i.e. the one AIX activates). Now if you start cluster services on one node, that node should acquire the resource group and if the service IP address is a member of the resource group, it should replace the boot address (or be added as an alias if that option is used when defining the network to HACMP). When you start cluster services on the other node, it may or may not take over the resource group depending on the takeover policy. Verify the service label is part of the resource group. If you rebooted the primary node and the resource group was acquired on the other node, it will not move to the primary node unless it has a cascading policy and the node you rebooted appears first in the resource group node list. - Mike |
| Thread Tools | |
| Display Modes | |
|
|
| ||||
| Posted By | For | Type | Date | |
| cfgmrgr unknown error - comp.unix.aix | Google Groups | This thread | Refback | 07-01-2008 08:18 PM | |