From: Chandler Bing on
On Jun 28, 7:34 pm, bod43 <Bo...(a)hotmail.co.uk> wrote:
> On 28 June, 21:03, alexd <troffa...(a)hotmail.com> wrote:
>
>
>
>
>
> > Meanwhile, at the comp.dcom.sys.cisco Job Justification Hearings,Chandler
> >Bingchose the tried and tested strategy of:
>
> > > My company is attempting to perform replication from one HP EVA SAN
> > > array to another HP EVA SAN array across the WAN.  We have a metro
> > > Ethernet connection between the two with one Gigabit of shared
> > > bandwidth.  
>
> > My first instinct would be to simulate the 1G WAN by bringing the two units
> > together and linking them with a simple gigabit link [or even 100M to
> > simulate a worst case scenario], and working upwards in complexity from
> > there.  Easier said than done, of course. When you get it working you can
> > have a look at a packet capture to give you a rough idea of what it should
> > look like.
>
> > > but we have been told that the pipe has never been completely saturated,
> > > and we’re not rate limited.
>
> > Rest assured that the second point will be addressed as soon as you address
> > the first one :-)
>
> > > When HP was questioned about the FCP malformed packets they stated that
> > > they use a proprietary protocol and that wireshark wouldn’t be able to
> > > decode it.
>
> > Vendor support have been known to be correct on occasion so I'll reserve
> > judgement for now.
>
> How far apart are the two units being replicated?
> Minimum ping rtt is the best distance metric.
>
> Reply from 208.69.34.231: bytes=32 time=49ms TTL=56
> Reply from 208.69.34.231: bytes=32 time=51ms TTL=56
> Reply from 208.69.34.231: bytes=32 time=48ms TTL=56
> Reply from 208.69.34.231: bytes=32 time=48ms TTL=56
>
> So in this case the min rtt is 48ms.
>
> This may be relevant:-
>
> http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=10....
>
> Look up "bandwidth delay product".- Hide quoted text -
>
> - Show quoted text -

The two FC gateways and SANs are 1300~1500 miles apart, RTT ping times
are ~36ms with very little jitter.

Alas, the article you linked has no solution. I found several like
this, which we pointed out to HP.
From: Chandler Bing on
On Jun 29, 2:30 pm, Stephen <stephen_h...(a)xyzworld.com> wrote:
> On Mon, 28 Jun 2010 11:48:59 -0700 (PDT),ChandlerBing
>
>
>
>
>
> <mel.chand...(a)gmail.com> wrote:
> >Greetings all,
>
> >My company is attempting to perform replication from one HP EVA SAN
> >array to another HP EVA SAN array across the WAN.  We have a metro
> >Ethernet connection between the two with one Gigabit of shared
> >bandwidth.  We share the bandwidth with our other business units, with
> >no QoS in place, but we have been told that the pipe has never been
> >completely saturated, and we’re not rate limited.  The SAN arrays are
> >on 4Gbps fiber channel brocade switches.  There are two devices called
> >MPX110’s that send the data from fiber channel to Ethernet.  Each MPX
> >has redundancy groups they perform replication for, and although they
> >have two Ethernet and two fiber channel ports on each, we only use one
> >on each.  Each MPX110 has a path they perform replication for to their
> >counter parts on the other side.  It is my understanding they
> >negotiate a tunnel between them, Fiber Channel over IP.  They’re each
> >on their own 6509 which have a uplinks to a 3750 and that goes across
> >the metro Ethernet to a 3560 on the other side, then up to a 3560
> >acting as the core and out to two 3560’s with an MPX on each one.
>
> >Now the problem, although we have one gigabit of bandwidth, they’ll
> >only use about 13Mbps of it each, we’ve verified this with iperf.
> >Each connection we’ll only take 13Mbps of bandwidth, parallel tests
> >show each connection gets 13Mbps of bandwidth.  The HP engineer told
> >us that at >5Mbps we get approximately 1.3Mbps of actually data, which
> >means that FCIP has 80% over head?  Can that be right?  The big huge
> >problem is that after running for several hours they’ll eventually
> >just die and have to rebooted to start replicating again.  They’re
> >already on the latest firmware (2.4.4.1).  The only error we get from
> >the statistic screen of the MPX’s says they’re getting TCP timeouts.
>
> >I’ve performed captures on both sides’ MPXs’ and the errors I see in a
> >60 sec sample are FCP malformed packets (~4300), duplicate ACK’s
> >(~41), previous segment lost (~3), fast retransmission (~3).  When HP
> >was questioned about the FCP malformed packets they stated that they
> >use a proprietary protocol and that wireshark wouldn’t be able to
> >decode it.  I’ve since searched for this protocol but can find no
> >references to it anywhere.  The other errors seem so minor and few it
> >would be hard to believe that they’re impacting the data stream that
> >much if at all.
>
> FWIW FCIP is a standard protocol - if HP have written something non
> standard then they should have it documented....
>
> Note a sniffer would normally refuse to decode something it doesnt
> understand, unless whoever wrote the protocol didnt follow whatever
> escape clauses are built in to allow non standard formats inside the
> standard wrapper.
>
> If you were being paranoid you would find a HP analyser and see if
> that shows errors.
>
> a Cisco doc about designing with FCIPhttp://cisco.biz/en/US/docs/solutions/Enterprise/Data_Center/HA_Clust...
> note the comments about applications being "synchronous" or not.
>
> FC seems to use a guaranteed buffer scheme, where there needs to be
> enough buffering to cope with the path delay to get wire speed
> throughput.
>
> i have run into issues with buffer credits in FC switches, where you
> need enough to cope with the speed / delay .
>
> As OPs have commented - check the timing, The GigE link may not follow
> a direct route, so you may have more delay than you or the protocols
> expect.
>
> >I’ll include a small sample of the captures, if it lets me.
>
> >Thanks in advance for your assistance.
>
> >ChandlerBing
>
> --
> Regards
>
> stephen_h...(a)xyzworld.com - replace xyz with ntl- Hide quoted text -
>
> - Show quoted text -

I since gotten Wireshark to decode properly. I had to disable FCP
decode.

I suspect that's at least part of our problem, as I believe the SAN
replication traffic over FCIP is synchronous. The QoS policy we
implemented helped, but still fails with high TCP timer expired error
count. I have nothing to correlate this to in my wireshark captuers
though.

If the FC switches have a buffer configuration, I'm not familiar with
it and I'm relying heavily on HP and our SAN engineer to configure
those pieces. It is interesting that we have little to no visbility
into the FC switch and any errors occuring on that side. The focus
seems to be on the ethernet and network side.
From: Chandler Bing on
On Jun 28, 11:48 am, Chandler Bing <mel.chand...(a)gmail.com> wrote:
> Greetings all,
>
> My company is attempting to perform replication from one HP EVA SAN
> array to another HP EVA SAN array across the WAN.  We have a metro
> Ethernet connection between the two with one Gigabit of shared
> bandwidth.  We share the bandwidth with our other business units, with
> no QoS in place, but we have been told that the pipe has never been
> completely saturated, and we’re not rate limited.  The SAN arrays are
> on 4Gbps fiber channel brocade switches.  There are two devices called
> MPX110’s that send the data from fiber channel to Ethernet.  Each MPX
> has redundancy groups they perform replication for, and although they
> have two Ethernet and two fiber channel ports on each, we only use one
> on each.  Each MPX110 has a path they perform replication for to their
> counter parts on the other side.  It is my understanding they
> negotiate a tunnel between them, Fiber Channel over IP.  They’re each
> on their own 6509 which have a uplinks to a 3750 and that goes across
> the metro Ethernet to a 3560 on the other side, then up to a 3560
> acting as the core and out to two 3560’s with an MPX on each one.
>
> Now the problem, although we have one gigabit of bandwidth, they’ll
> only use about 13Mbps of it each, we’ve verified this with iperf.
> Each connection we’ll only take 13Mbps of bandwidth, parallel tests
> show each connection gets 13Mbps of bandwidth.  The HP engineer told
> us that at >5Mbps we get approximately 1.3Mbps of actually data, which
> means that FCIP has 80% over head?  Can that be right?  The big huge
> problem is that after running for several hours they’ll eventually
> just die and have to rebooted to start replicating again.  They’re
> already on the latest firmware (2.4.4.1).  The only error we get from
> the statistic screen of the MPX’s says they’re getting TCP timeouts.
>
> I’ve performed captures on both sides’ MPXs’ and the errors I see in a
> 60 sec sample are FCP malformed packets (~4300), duplicate ACK’s
> (~41), previous segment lost (~3), fast retransmission (~3).  When HP
> was questioned about the FCP malformed packets they stated that they
> use a proprietary protocol and that wireshark wouldn’t be able to
> decode it.  I’ve since searched for this protocol but can find no
> references to it anywhere.  The other errors seem so minor and few it
> would be hard to believe that they’re impacting the data stream that
> much if at all.
>
> I’ll include a small sample of the captures, if it lets me.
>
> Thanks in advance for your assistance.
>
> ChandlerBing

Now the update:

I discovered if I disable the FCP decode, Wireshark does decode it
correctly as FCIP.

We applied a QoS config to flag SAN replication traffic as DSCP EF and
have seen consistent ping times of ~36ms between sites and the
bandwidth climb as high as 45Mbps on a 1Gbps link. They still fail
after replicating for a few hours. Last time we watched them
replicate for 12 hours and then fail. The TCP timer exceed counter
seems to indicate that is the problem, but I have nothing significant
on the wireshark captures to support this.

HP has decided that the MPX110 on the far side needs to be replaced.
I'll post an update after that's done.