Prev: picld errors
Next: confusing NIS/NFS/networking issue
From: James Carlson on 3 Oct 2007 09:08 "eeb4u(a)hotmail.com" <eeb4u(a)hotmail.com> writes: > On Sep 28, 1:19 pm, James Carlson <james.d.carl...(a)sun.com> wrote: > > To disable probe-based failure detection, just omit the configuration > > of test (-failover) addresses. [...] > /etc/hostname.ce0 > dev1-test netmask + broadcast + group ipmp deprecated -failover up > addif prodb netmask + broadcast + failover up > > /etc/hostname.ce1 > dev1-ce1 netmask + broadcast + group ipmp deprecated -failover standby > up > > What should I change my configuration to correct the ping flood I am > seeing in my firewall logs. As noted above, remove the test addresses. That would be something like this (noting that "netmask + broadcast + up" is the default, and that "failover" is implicit): /etc/hostname.ce0: prodb group ipmp /etc/hostname.ce1: 0 group ipmp standby You're better off giving ce1 a non-zero data address -- regardless of whether you configure probing or not -- and removing the "standby" keyword. If there's a failure in this configuration, all traffic will take a hit until the switch-over occurs. If you set up two data (or more) addresses, the system will attempt to use all of the addresses on all interfaces in the group, and thus only a fraction will be affected by a switch-over. In addition, using multiple data addresses allows for load sharing between the interfaces. If you have only one active interface, the system can't do that. The configuration you're using is called "active-standby," and the one I'd recommend is "active-active." (I'd also recommend using test addresses rather than relying on link-based failure detection alone, but I guess that's another story. Note that there's no requirement that test addresses used for probing be allocated within the same subnet as the data addresses, or that they even by IPv4. There's no reason to "waste" addresses for probing.) -- James Carlson, Solaris Networking <james.d.carlson(a)sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
From: Miroslav Zubcic on 12 Oct 2007 17:32 James Carlson wrote: > Note that there's no requirement that test addresses used for probing > be allocated within the same subnet as the data addresses, or that > they even by IPv4. There's no reason to "waste" addresses for > probing.) Working on some new cluster for customer lately, I have _different_ experiance. My experiance: if test adresses are on different subnet, in.mpathd in most (60-70 %) cases *forgets* to take failover action from failed to second interface in my stress test. It only logs that link is down and does nothing else. When I put test adresses on the same subnet where data address is, everything is ok. BTW, I'm replying to your older artice because I get idea from it for separate network with IPMP test addresses. Environment: 3 x Sun Fire V240 with 4 bge(7d) NIC's, two ipmp groups - ipmp0 (bge0 bge2) and ipmp1 (bge1, bge3). All up, running and pinging each other. Solaris 10 u4, latest patches applied. Tried active-active, stand by, various combinations. On the end, only combination from this RTFM http://docs.sun.com/app/docs/doc/819-3000/emybr?l=en&q=ipmp&a=view is working for me. -- Miroslav
From: James Carlson on 13 Oct 2007 15:28 Miroslav Zubcic <news(a)big-other.com> writes: > My experiance: if test adresses are on different subnet, in.mpathd in > most (60-70 %) cases *forgets* to take failover action from failed to > second interface in my stress test. It only logs that link is down and > does nothing else. When I put test adresses on the same subnet where > data address is, everything is ok. I've never seen that happen. Have you filed a bug? -- James Carlson, Solaris Networking <james.d.carlson(a)sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
From: Miroslav Zubcic on 14 Oct 2007 09:27 James Carlson wrote: > Miroslav Zubcic <news(a)big-other.com> writes: >> My experiance: if test adresses are on different subnet, in.mpathd in >> most (60-70 %) cases *forgets* to take failover action from failed to >> second interface in my stress test. It only logs that link is down and >> does nothing else. When I put test adresses on the same subnet where >> data address is, everything is ok. > I've never seen that happen. I have halted all 3 nodes with scshutdown. Powered of from ALOM and powered on. Now IPMP is working with different subnets for test addresses. Tested for 2 hours with sh / no sh on Cisco switch ports (STP is off on ports of course). This is my configuration (both 3 nodes same): file:/etc/networks external 192.168.1 mgmt 192.168.100 interconn1 192.168.110.16 interconn2 192.168.110.32 quorum 192.168.111 ipmp0 192.168.120 ipmp1 192.168.121 file:/etc/netmasks 192.168.1.0 255.255.255.0 192.168.111.0 255.255.255.0 192.168.120.0 255.255.255.0 192.168.121.0 255.255.255.0 192.168.100.0 255.255.255.0 file:/etc/hosts 127.0.0.1 localhost.localdomain localhost loghost 192.168.1.80 red.external.link red 192.168.120.90 red0.ipmp0 192.168.120.91 red1.ipmp0 192.168.120.92 red2.ipmp0 192.168.111.80 red.quorum 192.168.121.21 red0.ipmp1 192.168.121.22 red1.ipmp1 192.168.121.23 red2.ipmp1 192.168.100.80 red-alom.mgmt red-alom 192.168.1.81 green.external.link green 192.168.120.93 green0.ipmp0 192.168.120.94 green1.ipmp0 192.168.120.95 green2.ipmp0 192.168.111.81 green.quorum 192.168.121.24 green0.ipmp1 192.168.121.25 green1.ipmp1 192.168.121.26 green2.ipmp1 192.168.100.81 green-alom.mgmt green-alom 192.168.1.82 blue.external.link blue 192.168.120.96 blue0.ipmp0 192.168.120.97 blue1.ipmp0 192.168.120.98 blue2.ipmp0 192.168.111.82 blue.quorum 192.168.121.27 blue0.ipmp1 192.168.121.28 blue1.ipmp1 192.168.121.29 blue2.ipmp1 192.168.100.82 blue-alom.mgmt blue-alom 192.168.1.100 ns1.external.link 192.168.1.101 ns2.external.link 192.168.1.102 ns3.external.link 192.168.1.1 GW hostname files on all 3 nodes: Node: red: ========== /etc/hostname.bge0: red.external.link netmask + broadcast + group ipmp0 up addif red0.ipmp0 deprecated -failover netmask + broadcast + up /etc/hostname.bge1: red.quorum netmask + broadcast + group ipmp1 up addif red0.ipmp1 deprecated -failover netmask + broadcast + up /etc/hostname.bge2: red1.ipmp0 netmask + broadcast + group ipmp0 up addif red2.ipmp0 deprecated -failover netmask + broadcast + up /etc/hostname.bge3: red1.ipmp1 netmask + broadcast + group ipmp1 up addif red2.ipmp1 deprecated -failover netmask + broadcast + up Node: green: ============ /etc/hostname.bge0: green.external.link netmask + broadcast + group ipmp0 up addif green0.ipmp0 deprecated -failover netmask + broadcast + up /etc/hostname.bge1: green.quorum netmask + broadcast + group ipmp1 up addif green0.ipmp1 deprecated -failover netmask + broadcast + up /etc/hostname.bge2: green1.ipmp0 netmask + broadcast + group ipmp0 up addif green2.ipmp0 deprecated -failover netmask + broadcast + up /etc/hostname.bge3: green1.ipmp1 netmask + broadcast + group ipmp1 up addif green2.ipmp1 deprecated -failover netmask + broadcast + up Node: blue: =========== /etc/hostname.bge0: blue.external.link netmask + broadcast + group ipmp0 up addif blue0.ipmp0 deprecated -failover netmask + broadcast + up /etc/hostname.bge1: blue.quorum netmask + broadcast + group ipmp1 up addif blue0.ipmp1 deprecated -failover netmask + broadcast + up /etc/hostname.bge2: blue1.ipmp0 netmask + broadcast + group ipmp0 up addif blue2.ipmp0 deprecated -failover netmask + broadcast + up /etc/hostname.bge3: blue1.ipmp1 netmask + broadcast + group ipmp1 up addif blue2.ipmp1 deprecated -failover netmask + broadcast + up > Have you filed a bug? Good question! Please tell me, for Solaris 10, where can I fill a bugs; for bugs that are not reproducable on systems without service plan (test and development machines, desktops ...). I mean - I'm encountering many bugs in the system, but don't know where to send this info (apart from this USENet group). For example: broken or impossible circular SUNW package dependencies, drivers that stop working after patching, 3-node SunCluster 3.2 which is panicking other two nodes after first node is booting after network loss on all ipmp groups, strange IPMP behaviour, dbx/mdb traces of core dumps, ipf/ipnat incompatibilities with some other routers and their TCP/IP stacks where smtp server is not able to send final dot when sending mail on cca 3% of the internet and other such small annoyances? -- Man is something that shall be overcome. -- Friedrich Nietzsche
From: James Carlson on 15 Oct 2007 09:55
Miroslav Zubcic <news(a)big-other.com> writes: > James Carlson wrote: > > Miroslav Zubcic <news(a)big-other.com> writes: > >> My experiance: if test adresses are on different subnet, in.mpathd in > >> most (60-70 %) cases *forgets* to take failover action from failed to > >> second interface in my stress test. It only logs that link is down and > >> does nothing else. When I put test adresses on the same subnet where > >> data address is, everything is ok. > > > I've never seen that happen. > > I have halted all 3 nodes with scshutdown. scshutdown? Sorry; I didn't see where this was Sun Cluster. That may well make a difference. > Powered of from ALOM and > powered on. Now IPMP is working with different subnets for test > addresses. Tested for 2 hours with sh / no sh on Cisco switch ports (STP > is off on ports of course). OK. It's unclear to me where the problem you're seeing might be. I saw nothing in your configuration that would suggest that it would work only "sometimes." > > Have you filed a bug? > > Good question! Please tell me, for Solaris 10, where can I fill a bugs; Even though it's Solaris 10, you can still use bugs.opensolaris.org. However, your *first* step should be to contact support. > for bugs that are not reproducable on systems without service plan (test > and development machines, desktops ...). I mean - I'm encountering many > bugs in the system, but don't know where to send this info (apart from > this USENet group). For example: broken or impossible circular SUNW > package dependencies, drivers that stop working after patching, 3-node > SunCluster 3.2 which is panicking other two nodes after first node is > booting after network loss on all ipmp groups, strange IPMP behaviour, > dbx/mdb traces of core dumps, ipf/ipnat incompatibilities with some > other routers and their TCP/IP stacks where smtp server is not able to > send final dot when sending mail on cca 3% of the internet and other > such small annoyances? It sounds like you need help from support. If you can post details on those bugs -- either how to reproduce or substantial information about the failure mode and the software in use -- then it's possible that someone here might be able to diagnose the problem or even file a bug for you. But that's no substitute for paid support. If you're interested in community-supported code, I'd recommend using Solaris Express instead of Solaris 10. -- James Carlson, Solaris Networking <james.d.carlson(a)sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 |