IPMP [Solaris]

Prev: picld errors
Next: confusing NIS/NFS/networking issue

From: Miroslav Zubcic on 15 Oct 2007 18:47

James Carlson wrote:

> scshutdown? Sorry; I didn't see where this was Sun Cluster. That may
> well make a difference.

> OK. It's unclear to me where the problem you're seeing might be. I
> saw nothing in your configuration that would suggest that it would
> work only "sometimes."

Beats me ...

Oct 12 19:51:48 blue in.mpathd[147]: [ID 215189 daemon.error] The link
has gone down on bge1
Oct 12 19:51:48 blue in.mpathd[147]: [ID 594170 daemon.error] NIC
failure detected on bge1 of group ipmp1

And that's it. There was no failover!

But after scshutdown and boot, it has additional action:

Oct 12 21:35:15 blue in.mpathd[149]: [ID 832587 daemon.error] Successfully
failed over from NIC bge1 to NIC bge3

>> Good question! Please tell me, for Solaris 10, where can I fill a bugs;

> Even though it's Solaris 10, you can still use bugs.opensolaris.org.
> However, your *first* step should be to contact support.

For some bugs I will have it soon (waiting for bureaucracy), but some
bugs are only affected on desktop and development machines.

> It sounds like you need help from support. If you can post details on
> those bugs -- either how to reproduce or substantial information about
> the failure mode and the software in use -- then it's possible that
> someone here might be able to diagnose the problem or even file a bug
> for you.

Ok James, here are the details:

- 3 Sun Fire v240 servers.
- Solaris 10 u4 + 'pca -d; pca -i' last friday
- SunCluster 3.2
- Official patches

- 1 public interface on ipmp0 (bge0 + bge2)
- 1 quorum subnet (bge1 + bge3)
- 2 interconnect VLAN-s - one tagged on the top of bge1, second on bge3

Here is quorum info:

{1}(root!blue:~)# scstat -q

-- Quorum Summary --

Quorum votes possible: 9
Quorum votes needed: 5
Quorum votes present: 9

-- Quorum Votes by Node --

Node Name Present Possible Status
--------- ------- -------- ------
Node votes: red 1 1 Online
Node votes: green 1 1 Online
Node votes: blue 1 1 Online

-- Quorum Votes by Device --

Device Name Present Possible Status
----------- ------- -------- ------
Device votes: green-quorum-srv 2 2 Online
Device votes: blue-quorum-srv 2 2 Online
Device votes: red-quorum-srv 2 2 Online

My test is this:

Cisco switch:
kisko1(config)# int range gi 0/5 - 8
kisko1(config-if-range)#shut

in.mpathd logs that ipmp0 and ipmp1 groups are failed, all nic's are
failed (ok, that's reasonable)

Syslog on intentionaly screwed cluster node is too wide to paste it here
but here is the summary:

- NOTICE: CMM: Reconfiguration delaying for 14 seconds to allow larger
partitions to win race for quorum devices.
- NOTICE: clcomm: Path blue:bge780003 - red:bge780003 being drained
- NOTICE: clcomm: Path blue:bge780003 - green:bge780003 being drained
- WARNING: CMM: Connection to quorum server red-quorum-srv failed with
error 148.
- WARNING: CMM: Connection to quorum server green-quorum-srv failed with
error 148.
- Notifying cluster that this node is panicking

.... and then blue panics - this is expected behaviour, nothing terrible.

This is log on red - one of other two (operational) nodes:

- NOTICE: clcomm: Path red:bge780001 - blue:bge780001 being drained
- NOTICE: clcomm: Path red:bge780003 - blue:bge780003 being drained
- WARNING: CMM: Connection to quorum server blue-quorum-srv failed with
error 145.
- NOTICE: CMM: Node blue (nodeid = 1) is down.
- NOTICE: CMM: Cluster members: green red.
- NOTICE: CMM: node reconfiguration #6 completed.

then blue is booted again:

- NOTICE: CMM: Node blue (nodeid = 1) with votecount = 1 added.
- NOTICE: CMM: Node green (nodeid = 2) with votecount = 1 added.
- NOTICE: CMM: Node red (nodeid = 3) with votecount = 1 added.

- NOTICE: CMM: Quorum device 1 (red-quorum-srv) added; votecount = 2,
bitmask of nodes with configured paths = 0x7.
- NOTICE: CMM: Quorum device 2 (green-quorum-srv) added; votecount = 2,
bitmask of nodes with configured paths = 0x7.
- NOTICE: CMM: Quorum device 3 (blue-quorum-srv) added; votecount = 2,
bitmask of nodes with configured paths = 0x7.

NOTICE: clcomm: Adapter bge780001 constructed
NOTICE: clcomm: Adapter bge780003 constructed

- NOTICE: CMM: Node blue: attempting to join cluster.
- CMM: Node red (nodeid: 3, incarnation #: 1192289537) has become reachable.
- CMM: Node green (nodeid: 2, incarnation #: 1192289528) has become
reachable.
- NOTICE: CMM: Quorum device blue-quorum-srv: owner set to node 1.
- WARNING: CMM: Our partition has been preempted.
- NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for
quorum.

In that moment when blue logs that partition has been preempted, red and
green are going mad because they are thinking that quorum is lost. I
don't know why. There are 9 possible votes in cluster, 5 minimum for quorum.
Without blue, red and green have 6 votes (every node has 1, every quorum
server has 2 --- 1+2 + 1+2 = 6). When blue boots it is joined in
cluster, it's quorum server is up and cluster should have all 9 votes
again and in fully operational state. But instead of such behaviour, I
get this from red and green:

red and green (red in this particular log summary):

- NOTICE: CMM: Node synthesis (nodeid: 1, incarnation #: 1192324379) has
become reachable.
- NOTICE: clcomm: Path red:bge780001 - blue:bge780001 online
- WARNING: CMM: Our partition has been preempted.
- Notifying cluster that this node is panicking
- red ^Mpanic[cpu0]/thread=300046bc080:
- CMM: Cluster lost operational quorum; aborting.

After that, red and green are rebooting after panic, while blue is
waiting for a quorum. When red and green are booted again, quorum is
constructed again and cluster operational, but in the meantime, there is
no service.

Notice 1: this bug with quorum doesn't occur if I panic node manually
from ok prompt with sync, or if I reboot it with 'init 6' only if I cut
all ethernet to one node (any one node, "blue" in this example)

Notice 2: Loosy workaround after ethernet loss test, if I boot failed
node in non-cluster mode (boot -x) and clear quorum server data with
clquorumserver(1CL) and reboot in cluster mode, other two nodes does not
panic, but I have to remove this node's quorum server from configuration
repository and add it again (with clquorum(1CL) command).

Notice 3: this behaviour is always reproducable.

I suspect that quorum server from failed node or other two quorum server
have corrupted or invalid data (as if other machine has double or more
votes when it boots, not only 1+2), but "scstat -q" or "clquorum status"
shows me correct data while one node is beeing rebooted, that is,
outside cluster. Cluster has 6 votes - 1 more than needed when 1 node is
down.

> But that's no substitute for paid support. If you're interested in
> community-supported code, I'd recommend using Solaris Express instead
> of Solaris 10.

For this 3 machines, Solaris 10 and SunCluster 3.2 our customer will
have paid support just like it has for their other Sun products. I have
just filled text for support stuff (see above) and I'm waiting for a
contact ...

In the meantime, I'm going to register on bugs.opensolaris.org.

--
Miroslav

From: James Carlson on 16 Oct 2007 08:18

Miroslav Zubcic <news(a)big-other.com> writes:
> James Carlson wrote:
>
> > scshutdown? Sorry; I didn't see where this was Sun Cluster. That may
> > well make a difference.
>
> > OK. It's unclear to me where the problem you're seeing might be. I
> > saw nothing in your configuration that would suggest that it would
> > work only "sometimes."
>
> Beats me ...
>
> Oct 12 19:51:48 blue in.mpathd[147]: [ID 215189 daemon.error] The link
> has gone down on bge1

"The link has gone down" means a loss of physical link state. I'll
assume that's intentional (based on the Cisco "shutdown" command
you're using).

> Oct 12 19:51:48 blue in.mpathd[147]: [ID 594170 daemon.error] NIC
> failure detected on bge1 of group ipmp1
>
> And that's it. There was no failover!

What was the "ifconfig -a" output after this? What state are the
interfaces in?

This sounds a bit like CR 6458158. It's a latent problem that was
exposed by patch ID 125040-01 / 125041-01. (Another possibility is CR
6454429.) Either way, you need to be working with support to get to
the bottom of this.

> But after scshutdown and boot, it has additional action:
>
> Oct 12 21:35:15 blue in.mpathd[149]: [ID 832587 daemon.error] Successfully
> failed over from NIC bge1 to NIC bge3

Yes, that looks like a normal fail-over.

> - 1 public interface on ipmp0 (bge0 + bge2)
> - 1 quorum subnet (bge1 + bge3)
> - 2 interconnect VLAN-s - one tagged on the top of bge1, second on bge3

So, you have only one IP data address (non-deprecated logical
interface) on each ipmp group, right?

> In that moment when blue logs that partition has been preempted, red and
> green are going mad because they are thinking that quorum is lost. I
> don't know why. There are 9 possible votes in cluster, 5 minimum for quorum.

I'm no Sun Cluster expert. I could file a bug for you, but I think
you'd be *much* better off working with support on the Sun Cluster
issues.

--
James Carlson, Solaris Networking <james.d.carlson(a)sun.com>
Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677

From: Miroslav Zubcic on 16 Oct 2007 09:07

James Carlson wrote:

> Miroslav Zubcic <news(a)big-other.com> writes:

>> Oct 12 19:51:48 blue in.mpathd[147]: [ID 215189 daemon.error] The link
>> has gone down on bge1

> "The link has gone down" means a loss of physical link state. I'll
> assume that's intentional (based on the Cisco "shutdown" command
> you're using).

Of course. Whenever I set up some cluster, Red Hat RHCS, and now for the
first time SunCluster, I have my own list of stress tests for the whole
system and cluster, and then we are ready to present machines to the
customer. I'm just now running SunVTS on all 3 nodes ...

>> Oct 12 19:51:48 blue in.mpathd[147]: [ID 594170 daemon.error] NIC
>> failure detected on bge1 of group ipmp1
>>
>> And that's it. There was no failover!

> What was the "ifconfig -a" output after this? What state are the
> interfaces in?

Primary (defunct) inteface had FAILED flag, but second interface was not
changed in any way. Fallback subinterface was not plumbed on it.

After turning link back on, i had to manually remove FAILED flag from
primary interface with ifconfig(1m), because in.mpathd just logs that
interface link is again avaiable, but there was no traffic in and out.

> This sounds a bit like CR 6458158. It's a latent problem that was
> exposed by patch ID 125040-01 / 125041-01. (Another possibility is CR
> 6454429.) Either way, you need to be working with support to get to
> the bottom of this.

Patch 125040-01 was obsoleted with 120011-14 on update #4. This is fresh
installation BTW. But yes, CR 6458158 sounds like this.

I didn't experianced failback bug from description of CR 6454429.

>> Oct 12 21:35:15 blue in.mpathd[149]: [ID 832587 daemon.error] Successfully
>> failed over from NIC bge1 to NIC bge3

> Yes, that looks like a normal fail-over.

After that, I didn't managet to make it repeat a (possible) bug anymore. I
tested it last night once again and everything is ok now on all 3 nodes.
Strange.

>> - 1 public interface on ipmp0 (bge0 + bge2)
>> - 1 quorum subnet (bge1 + bge3)
>> - 2 interconnect VLAN-s - one tagged on the top of bge1, second on bge3

> So, you have only one IP data address (non-deprecated logical
> interface) on each ipmp group, right?

Right. This is what I call basic IP address of the host (for ssh, snmp,
ping etc). Service addresses are managed plumbed/unplumbed by the cluster
resource manager like subinterfaces on active interface.

>> In that moment when blue logs that partition has been preempted, red and
>> green are going mad because they are thinking that quorum is lost. I
>> don't know why. There are 9 possible votes in cluster, 5 minimum for quorum.

> I'm no Sun Cluster expert. I could file a bug for you, but I think
> you'd be *much* better off working with support on the Sun Cluster
> issues.

I have filled cluster bug myself last night and today morning I have got
semi-generated mail from someone at Sun, that case is received as CR
6617066, but thank you for your kind offer anyway. Maybe you can point
attention to cluster team eventually.

--
Miroslav

First | Prev |
Pages: 1 2 3
Prev: picld errors
Next: confusing NIS/NFS/networking issue