Prev: picld errors
Next: confusing NIS/NFS/networking issue
From: Miroslav Zubcic on 15 Oct 2007 18:47 James Carlson wrote: > scshutdown? Sorry; I didn't see where this was Sun Cluster. That may > well make a difference. > OK. It's unclear to me where the problem you're seeing might be. I > saw nothing in your configuration that would suggest that it would > work only "sometimes." Beats me ... Oct 12 19:51:48 blue in.mpathd[147]: [ID 215189 daemon.error] The link has gone down on bge1 Oct 12 19:51:48 blue in.mpathd[147]: [ID 594170 daemon.error] NIC failure detected on bge1 of group ipmp1 And that's it. There was no failover! But after scshutdown and boot, it has additional action: Oct 12 21:35:15 blue in.mpathd[149]: [ID 832587 daemon.error] Successfully failed over from NIC bge1 to NIC bge3 >> Good question! Please tell me, for Solaris 10, where can I fill a bugs; > Even though it's Solaris 10, you can still use bugs.opensolaris.org. > However, your *first* step should be to contact support. For some bugs I will have it soon (waiting for bureaucracy), but some bugs are only affected on desktop and development machines. > It sounds like you need help from support. If you can post details on > those bugs -- either how to reproduce or substantial information about > the failure mode and the software in use -- then it's possible that > someone here might be able to diagnose the problem or even file a bug > for you. Ok James, here are the details: - 3 Sun Fire v240 servers. - Solaris 10 u4 + 'pca -d; pca -i' last friday - SunCluster 3.2 - Official patches - 1 public interface on ipmp0 (bge0 + bge2) - 1 quorum subnet (bge1 + bge3) - 2 interconnect VLAN-s - one tagged on the top of bge1, second on bge3 Here is quorum info: {1}(root!blue:~)# scstat -q -- Quorum Summary -- Quorum votes possible: 9 Quorum votes needed: 5 Quorum votes present: 9 -- Quorum Votes by Node -- Node Name Present Possible Status --------- ------- -------- ------ Node votes: red 1 1 Online Node votes: green 1 1 Online Node votes: blue 1 1 Online -- Quorum Votes by Device -- Device Name Present Possible Status ----------- ------- -------- ------ Device votes: green-quorum-srv 2 2 Online Device votes: blue-quorum-srv 2 2 Online Device votes: red-quorum-srv 2 2 Online My test is this: Cisco switch: kisko1(config)# int range gi 0/5 - 8 kisko1(config-if-range)#shut in.mpathd logs that ipmp0 and ipmp1 groups are failed, all nic's are failed (ok, that's reasonable) Syslog on intentionaly screwed cluster node is too wide to paste it here but here is the summary: - NOTICE: CMM: Reconfiguration delaying for 14 seconds to allow larger partitions to win race for quorum devices. - NOTICE: clcomm: Path blue:bge780003 - red:bge780003 being drained - NOTICE: clcomm: Path blue:bge780003 - green:bge780003 being drained - WARNING: CMM: Connection to quorum server red-quorum-srv failed with error 148. - WARNING: CMM: Connection to quorum server green-quorum-srv failed with error 148. - Notifying cluster that this node is panicking .... and then blue panics - this is expected behaviour, nothing terrible. This is log on red - one of other two (operational) nodes: - NOTICE: clcomm: Path red:bge780001 - blue:bge780001 being drained - NOTICE: clcomm: Path red:bge780003 - blue:bge780003 being drained - WARNING: CMM: Connection to quorum server blue-quorum-srv failed with error 145. - NOTICE: CMM: Node blue (nodeid = 1) is down. - NOTICE: CMM: Cluster members: green red. - NOTICE: CMM: node reconfiguration #6 completed. then blue is booted again: - NOTICE: CMM: Node blue (nodeid = 1) with votecount = 1 added. - NOTICE: CMM: Node green (nodeid = 2) with votecount = 1 added. - NOTICE: CMM: Node red (nodeid = 3) with votecount = 1 added. - NOTICE: CMM: Quorum device 1 (red-quorum-srv) added; votecount = 2, bitmask of nodes with configured paths = 0x7. - NOTICE: CMM: Quorum device 2 (green-quorum-srv) added; votecount = 2, bitmask of nodes with configured paths = 0x7. - NOTICE: CMM: Quorum device 3 (blue-quorum-srv) added; votecount = 2, bitmask of nodes with configured paths = 0x7. NOTICE: clcomm: Adapter bge780001 constructed NOTICE: clcomm: Adapter bge780003 constructed - NOTICE: CMM: Node blue: attempting to join cluster. - CMM: Node red (nodeid: 3, incarnation #: 1192289537) has become reachable. - CMM: Node green (nodeid: 2, incarnation #: 1192289528) has become reachable. - NOTICE: CMM: Quorum device blue-quorum-srv: owner set to node 1. - WARNING: CMM: Our partition has been preempted. - NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum. In that moment when blue logs that partition has been preempted, red and green are going mad because they are thinking that quorum is lost. I don't know why. There are 9 possible votes in cluster, 5 minimum for quorum. Without blue, red and green have 6 votes (every node has 1, every quorum server has 2 --- 1+2 + 1+2 = 6). When blue boots it is joined in cluster, it's quorum server is up and cluster should have all 9 votes again and in fully operational state. But instead of such behaviour, I get this from red and green: red and green (red in this particular log summary): - NOTICE: CMM: Node synthesis (nodeid: 1, incarnation #: 1192324379) has become reachable. - NOTICE: clcomm: Path red:bge780001 - blue:bge780001 online - WARNING: CMM: Our partition has been preempted. - Notifying cluster that this node is panicking - red ^Mpanic[cpu0]/thread=300046bc080: - CMM: Cluster lost operational quorum; aborting. After that, red and green are rebooting after panic, while blue is waiting for a quorum. When red and green are booted again, quorum is constructed again and cluster operational, but in the meantime, there is no service. Notice 1: this bug with quorum doesn't occur if I panic node manually from ok prompt with sync, or if I reboot it with 'init 6' only if I cut all ethernet to one node (any one node, "blue" in this example) Notice 2: Loosy workaround after ethernet loss test, if I boot failed node in non-cluster mode (boot -x) and clear quorum server data with clquorumserver(1CL) and reboot in cluster mode, other two nodes does not panic, but I have to remove this node's quorum server from configuration repository and add it again (with clquorum(1CL) command). Notice 3: this behaviour is always reproducable. I suspect that quorum server from failed node or other two quorum server have corrupted or invalid data (as if other machine has double or more votes when it boots, not only 1+2), but "scstat -q" or "clquorum status" shows me correct data while one node is beeing rebooted, that is, outside cluster. Cluster has 6 votes - 1 more than needed when 1 node is down. > But that's no substitute for paid support. If you're interested in > community-supported code, I'd recommend using Solaris Express instead > of Solaris 10. For this 3 machines, Solaris 10 and SunCluster 3.2 our customer will have paid support just like it has for their other Sun products. I have just filled text for support stuff (see above) and I'm waiting for a contact ... In the meantime, I'm going to register on bugs.opensolaris.org. -- Miroslav
From: James Carlson on 16 Oct 2007 08:18 Miroslav Zubcic <news(a)big-other.com> writes: > James Carlson wrote: > > > scshutdown? Sorry; I didn't see where this was Sun Cluster. That may > > well make a difference. > > > OK. It's unclear to me where the problem you're seeing might be. I > > saw nothing in your configuration that would suggest that it would > > work only "sometimes." > > Beats me ... > > Oct 12 19:51:48 blue in.mpathd[147]: [ID 215189 daemon.error] The link > has gone down on bge1 "The link has gone down" means a loss of physical link state. I'll assume that's intentional (based on the Cisco "shutdown" command you're using). > Oct 12 19:51:48 blue in.mpathd[147]: [ID 594170 daemon.error] NIC > failure detected on bge1 of group ipmp1 > > And that's it. There was no failover! What was the "ifconfig -a" output after this? What state are the interfaces in? This sounds a bit like CR 6458158. It's a latent problem that was exposed by patch ID 125040-01 / 125041-01. (Another possibility is CR 6454429.) Either way, you need to be working with support to get to the bottom of this. > But after scshutdown and boot, it has additional action: > > Oct 12 21:35:15 blue in.mpathd[149]: [ID 832587 daemon.error] Successfully > failed over from NIC bge1 to NIC bge3 Yes, that looks like a normal fail-over. > - 1 public interface on ipmp0 (bge0 + bge2) > - 1 quorum subnet (bge1 + bge3) > - 2 interconnect VLAN-s - one tagged on the top of bge1, second on bge3 So, you have only one IP data address (non-deprecated logical interface) on each ipmp group, right? > In that moment when blue logs that partition has been preempted, red and > green are going mad because they are thinking that quorum is lost. I > don't know why. There are 9 possible votes in cluster, 5 minimum for quorum. I'm no Sun Cluster expert. I could file a bug for you, but I think you'd be *much* better off working with support on the Sun Cluster issues. -- James Carlson, Solaris Networking <james.d.carlson(a)sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
From: Miroslav Zubcic on 16 Oct 2007 09:07
James Carlson wrote: > Miroslav Zubcic <news(a)big-other.com> writes: >> Oct 12 19:51:48 blue in.mpathd[147]: [ID 215189 daemon.error] The link >> has gone down on bge1 > "The link has gone down" means a loss of physical link state. I'll > assume that's intentional (based on the Cisco "shutdown" command > you're using). Of course. Whenever I set up some cluster, Red Hat RHCS, and now for the first time SunCluster, I have my own list of stress tests for the whole system and cluster, and then we are ready to present machines to the customer. I'm just now running SunVTS on all 3 nodes ... >> Oct 12 19:51:48 blue in.mpathd[147]: [ID 594170 daemon.error] NIC >> failure detected on bge1 of group ipmp1 >> >> And that's it. There was no failover! > What was the "ifconfig -a" output after this? What state are the > interfaces in? Primary (defunct) inteface had FAILED flag, but second interface was not changed in any way. Fallback subinterface was not plumbed on it. After turning link back on, i had to manually remove FAILED flag from primary interface with ifconfig(1m), because in.mpathd just logs that interface link is again avaiable, but there was no traffic in and out. > This sounds a bit like CR 6458158. It's a latent problem that was > exposed by patch ID 125040-01 / 125041-01. (Another possibility is CR > 6454429.) Either way, you need to be working with support to get to > the bottom of this. Patch 125040-01 was obsoleted with 120011-14 on update #4. This is fresh installation BTW. But yes, CR 6458158 sounds like this. I didn't experianced failback bug from description of CR 6454429. >> Oct 12 21:35:15 blue in.mpathd[149]: [ID 832587 daemon.error] Successfully >> failed over from NIC bge1 to NIC bge3 > Yes, that looks like a normal fail-over. After that, I didn't managet to make it repeat a (possible) bug anymore. I tested it last night once again and everything is ok now on all 3 nodes. Strange. >> - 1 public interface on ipmp0 (bge0 + bge2) >> - 1 quorum subnet (bge1 + bge3) >> - 2 interconnect VLAN-s - one tagged on the top of bge1, second on bge3 > So, you have only one IP data address (non-deprecated logical > interface) on each ipmp group, right? Right. This is what I call basic IP address of the host (for ssh, snmp, ping etc). Service addresses are managed plumbed/unplumbed by the cluster resource manager like subinterfaces on active interface. >> In that moment when blue logs that partition has been preempted, red and >> green are going mad because they are thinking that quorum is lost. I >> don't know why. There are 9 possible votes in cluster, 5 minimum for quorum. > I'm no Sun Cluster expert. I could file a bug for you, but I think > you'd be *much* better off working with support on the Sun Cluster > issues. I have filled cluster bug myself last night and today morning I have got semi-generated mail from someone at Sun, that case is received as CR 6617066, but thank you for your kind offer anyway. Maybe you can point attention to cluster team eventually. -- Miroslav |