Prev: Cisco PIX 501 - VPNC connections blocked from internal lan to external end-point [SOLVED]
Next: snmp-server ifindex persist / Modules not persistent on WS-C650x
From: Chandler Bing on 28 Jun 2010 14:48 Greetings all, My company is attempting to perform replication from one HP EVA SAN array to another HP EVA SAN array across the WAN. We have a metro Ethernet connection between the two with one Gigabit of shared bandwidth. We share the bandwidth with our other business units, with no QoS in place, but we have been told that the pipe has never been completely saturated, and were not rate limited. The SAN arrays are on 4Gbps fiber channel brocade switches. There are two devices called MPX110s that send the data from fiber channel to Ethernet. Each MPX has redundancy groups they perform replication for, and although they have two Ethernet and two fiber channel ports on each, we only use one on each. Each MPX110 has a path they perform replication for to their counter parts on the other side. It is my understanding they negotiate a tunnel between them, Fiber Channel over IP. Theyre each on their own 6509 which have a uplinks to a 3750 and that goes across the metro Ethernet to a 3560 on the other side, then up to a 3560 acting as the core and out to two 3560s with an MPX on each one. Now the problem, although we have one gigabit of bandwidth, theyll only use about 13Mbps of it each, weve verified this with iperf. Each connection well only take 13Mbps of bandwidth, parallel tests show each connection gets 13Mbps of bandwidth. The HP engineer told us that at >5Mbps we get approximately 1.3Mbps of actually data, which means that FCIP has 80% over head? Can that be right? The big huge problem is that after running for several hours theyll eventually just die and have to rebooted to start replicating again. Theyre already on the latest firmware (2.4.4.1). The only error we get from the statistic screen of the MPXs says theyre getting TCP timeouts. Ive performed captures on both sides MPXs and the errors I see in a 60 sec sample are FCP malformed packets (~4300), duplicate ACKs (~41), previous segment lost (~3), fast retransmission (~3). When HP was questioned about the FCP malformed packets they stated that they use a proprietary protocol and that wireshark wouldnt be able to decode it. Ive since searched for this protocol but can find no references to it anywhere. The other errors seem so minor and few it would be hard to believe that theyre impacting the data stream that much if at all. Ill include a small sample of the captures, if it lets me. Thanks in advance for your assistance. Chandler Bing
From: alexd on 28 Jun 2010 16:03 Meanwhile, at the comp.dcom.sys.cisco Job Justification Hearings, Chandler Bing chose the tried and tested strategy of: > My company is attempting to perform replication from one HP EVA SAN > array to another HP EVA SAN array across the WAN. We have a metro > Ethernet connection between the two with one Gigabit of shared > bandwidth. My first instinct would be to simulate the 1G WAN by bringing the two units together and linking them with a simple gigabit link [or even 100M to simulate a worst case scenario], and working upwards in complexity from there. Easier said than done, of course. When you get it working you can have a look at a packet capture to give you a rough idea of what it should look like. > but we have been told that the pipe has never been completely saturated, > and we're not rate limited. Rest assured that the second point will be addressed as soon as you address the first one :-) > When HP was questioned about the FCP malformed packets they stated that > they use a proprietary protocol and that wireshark wouldn't be able to > decode it. Vendor support have been known to be correct on occasion so I'll reserve judgement for now. -- <http://ale.cx/> (AIM:troffasky) (UnSoEsNpEaTm(a)ale.cx) 20:49:47 up 2 days, 8:18, 5 users, load average: 0.01, 0.02, 0.23 Qua illic est accuso, illic est a vindicatum
From: bod43 on 28 Jun 2010 22:34 On 28 June, 21:03, alexd <troffa...(a)hotmail.com> wrote: > Meanwhile, at the comp.dcom.sys.cisco Job Justification Hearings, Chandler > Bing chose the tried and tested strategy of: > > > My company is attempting to perform replication from one HP EVA SAN > > array to another HP EVA SAN array across the WAN. We have a metro > > Ethernet connection between the two with one Gigabit of shared > > bandwidth. > > My first instinct would be to simulate the 1G WAN by bringing the two units > together and linking them with a simple gigabit link [or even 100M to > simulate a worst case scenario], and working upwards in complexity from > there. Easier said than done, of course. When you get it working you can > have a look at a packet capture to give you a rough idea of what it should > look like. > > > but we have been told that the pipe has never been completely saturated, > > and were not rate limited. > > Rest assured that the second point will be addressed as soon as you address > the first one :-) > > > When HP was questioned about the FCP malformed packets they stated that > > they use a proprietary protocol and that wireshark wouldnt be able to > > decode it. > > Vendor support have been known to be correct on occasion so I'll reserve > judgement for now. How far apart are the two units being replicated? Minimum ping rtt is the best distance metric. Reply from 208.69.34.231: bytes=32 time=49ms TTL=56 Reply from 208.69.34.231: bytes=32 time=51ms TTL=56 Reply from 208.69.34.231: bytes=32 time=48ms TTL=56 Reply from 208.69.34.231: bytes=32 time=48ms TTL=56 So in this case the min rtt is 48ms. This may be relevant:- http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1277089848897+28353475&threadId=1212310 Look up "bandwidth delay product".
From: Stephen on 29 Jun 2010 17:30 On Mon, 28 Jun 2010 11:48:59 -0700 (PDT), Chandler Bing <mel.chandler(a)gmail.com> wrote: >Greetings all, > >My company is attempting to perform replication from one HP EVA SAN >array to another HP EVA SAN array across the WAN. We have a metro >Ethernet connection between the two with one Gigabit of shared >bandwidth. We share the bandwidth with our other business units, with >no QoS in place, but we have been told that the pipe has never been >completely saturated, and we�re not rate limited. The SAN arrays are >on 4Gbps fiber channel brocade switches. There are two devices called >MPX110�s that send the data from fiber channel to Ethernet. Each MPX >has redundancy groups they perform replication for, and although they >have two Ethernet and two fiber channel ports on each, we only use one >on each. Each MPX110 has a path they perform replication for to their >counter parts on the other side. It is my understanding they >negotiate a tunnel between them, Fiber Channel over IP. They�re each >on their own 6509 which have a uplinks to a 3750 and that goes across >the metro Ethernet to a 3560 on the other side, then up to a 3560 >acting as the core and out to two 3560�s with an MPX on each one. > >Now the problem, although we have one gigabit of bandwidth, they�ll >only use about 13Mbps of it each, we�ve verified this with iperf. >Each connection we�ll only take 13Mbps of bandwidth, parallel tests >show each connection gets 13Mbps of bandwidth. The HP engineer told >us that at >5Mbps we get approximately 1.3Mbps of actually data, which >means that FCIP has 80% over head? Can that be right? The big huge >problem is that after running for several hours they�ll eventually >just die and have to rebooted to start replicating again. They�re >already on the latest firmware (2.4.4.1). The only error we get from >the statistic screen of the MPX�s says they�re getting TCP timeouts. > >I�ve performed captures on both sides� MPXs� and the errors I see in a >60 sec sample are FCP malformed packets (~4300), duplicate ACK�s >(~41), previous segment lost (~3), fast retransmission (~3). When HP >was questioned about the FCP malformed packets they stated that they >use a proprietary protocol and that wireshark wouldn�t be able to >decode it. I�ve since searched for this protocol but can find no >references to it anywhere. The other errors seem so minor and few it >would be hard to believe that they�re impacting the data stream that >much if at all. FWIW FCIP is a standard protocol - if HP have written something non standard then they should have it documented.... Note a sniffer would normally refuse to decode something it doesnt understand, unless whoever wrote the protocol didnt follow whatever escape clauses are built in to allow non standard formats inside the standard wrapper. If you were being paranoid you would find a HP analyser and see if that shows errors. a Cisco doc about designing with FCIP http://cisco.biz/en/US/docs/solutions/Enterprise/Data_Center/HA_Clusters/HA_FCI_4.html note the comments about applications being "synchronous" or not. FC seems to use a guaranteed buffer scheme, where there needs to be enough buffering to cope with the path delay to get wire speed throughput. i have run into issues with buffer credits in FC switches, where you need enough to cope with the speed / delay . As OPs have commented - check the timing, The GigE link may not follow a direct route, so you may have more delay than you or the protocols expect. > >I�ll include a small sample of the captures, if it lets me. > >Thanks in advance for your assistance. > >Chandler Bing -- Regards stephen_hope(a)xyzworld.com - replace xyz with ntl
From: Chandler Bing on 14 Jul 2010 15:04
On Jun 28, 1:03 pm, alexd <troffa...(a)hotmail.com> wrote: > Meanwhile, at the comp.dcom.sys.cisco Job Justification Hearings,ChandlerBingchose the tried and tested strategy of: > > > My company is attempting to perform replication from one HP EVA SAN > > array to another HP EVA SAN array across the WAN. We have a metro > > Ethernet connection between the two with one Gigabit of shared > > bandwidth. > > My first instinct would be to simulate the 1G WAN by bringing the two units > together and linking them with a simple gigabit link [or even 100M to > simulate a worst case scenario], and working upwards in complexity from > there. Easier said than done, of course. When you get it working you can > have a look at a packet capture to give you a rough idea of what it should > look like. > > > but we have been told that the pipe has never been completely saturated, > > and were not rate limited. > > Rest assured that the second point will be addressed as soon as you address > the first one :-) > > > When HP was questioned about the FCP malformed packets they stated that > > they use a proprietary protocol and that wireshark wouldnt be able to > > decode it. > > Vendor support have been known to be correct on occasion so I'll reserve > judgement for now. > > -- > <http://ale.cx/> (AIM:troffasky) (UnSoEsNpE...(a)ale.cx) > 20:49:47 up 2 days, 8:18, 5 users, load average: 0.01, 0.02, 0.23 > Qua illic est accuso, illic est a vindicatum HP did something similiar in a lab, they setup the MPX's on a single switch (No WAN) and had them replicating at break neck speeds. When we pointed out that there was no WAN delay, bandwidth limitations, or other devices in the mix they simply shrugged at us. We have implemented QoS which seems to have given it additional bandwidth (not sure why). We've seen it climb to 45Mbps, but then fail after 12 hours. |