Prev: AD backup
Next: Windows 2008 Domain Name Rename
From: Dave Onex on 27 Nov 2009 15:48 >>> >>> Hi Ace; >>> >>> I really don't think it's an ISA issue. I just re-built the ISA server >>> from scratch and the log-in problem occurred well before ISA was >>> installed. I did a clean O/S install and then went to Service Pack 4 and >>> then joined the domain. As soon as I joined the domain - blammo - the >>> long log-in times started happening. So, I was able to eliminate ISA >>> from the loop right off the bat. >>> >>> I have been pulling my hair out. It took a long time to re-build the ISA >>> machine (many, many long log-in's occurred after each re-start!). >>> >>> At this point I don't know what it is. It's a weird thing but it's true. >>> One thing I just found out is that one of my secondary DNS servers was >>> not able to pull a copy of a zone from the primary. The event viewer for >>> the secondary DNS server complains that the primary did not send the >>> zone and the logs on the primary report that it did send the zone. >>> >>> There's something weird going on here.... >>> >> >> After many hours of messing around, staring at DNS entries across 4 >> servers, doing a ground-up re-build of the firewall - I finally figured >> out what was happened. >> Are you ready? ................. >> >> The switch packed it in. More specifically, certain limited aspects of >> the switch packed it in.... >> >> I'm sitting here at my wits end. Everything worked perfectly before I >> added the extra NIC to the firewall. I configured multi-link trunks on >> the appropriate switch ports. Somehow or another, the switch either >> failed at that moment or the software that runs the switch got corrupted. >> >> That's why weird things were happening. For instance, the zone transfer >> from the primary DNS to the secondary. The transfer would begin and then >> the secondary would report that the transfer failed telling me to go look >> for clues at the primary. Looking at the primary showed that the transfer >> succeeded. So where did the data go? Into the ether, I guess. >> >> In that case the primary DC did do the transfer. It was the secondary >> that didn't get all the data because of... the switch. I turned on >> logging on the secondary DNS server and it showed that the transfer was >> taking place but that it had not completed. After waiting for some time >> it would then cough up the error message. >> >> I happened to have another identical switch here so I tried a last ditch >> effort and changed it out. Bingo - the speed increase across the network >> was instant. Everything is responding instantly again. Log-on's are >> instant. >> >> So, the reason the domain controllers had zero issues must have been that >> the ports on the switch they were connected to were OK. The reason the >> Proxy and the Mail server were having issues was because there was some >> form of corruption in the switch for those ports. Data flowing through >> those SPECIFIC ports was being lost or corrupted causing all sorts of >> log-in problems and DNS transfers between servers. >> >> Go figure. >> >> I knew my DNS was perfect - the network was instant prior to installing >> the extra NIC. Same with ISA - it's been in place for about 4 years now >> and it's rock-solid. Thing is, because I couldn't figure out what was >> causing this weird behavior I went looking all over the place only to >> find out the switch was corrupted. >> >> The network performance was so poor it was as if the entire network was >> infected with a virus. It was slow, 'jerky' and annoying. In retrospect >> it was probably packets being shed intermittently in the switch itself. >> >> The interesting thing is that small zone transfers would work. It was the >> larger zone (my active directory zone) that would not transfer. So it's >> almost as if small stuff would get through and larger transfers wouldn't. >> This meant that pings worked perfectly, nslookups worked perfectly but >> any sizable transfers (such as probably occurs when logging on) would >> fail. That's why I could read that little group policy file from each of >> the affected computers - it was small enough. >> >> I don't know how a switch works (inside) but I know this much - the one I >> had in place selectively failed on specific ports affecting those two >> servers and the type of failure meant small traffic got through and large >> traffic would not. That's why all my ICMP diagnostic traffic succeeded >> and that's why I was pulling out my hair - there was no apparent reason >> for the problems I was having to occur. If you can ping all the machines >> and do forward and reverse lookups to them - it should work! >> >> Hahaha - anyway, I just thought I would let you know what it ended up >> being in the end. That's what I get for trying to _increase_ network >> performance by adding another NIC to the proxy! >> >> Best & Thanks! >> Dave >> > > > Wow. And I've seen this before with switches and teaming, but I just > didn't think of it. Some switches by default, will do that when you > connect the two NICs on the same switch, even if teamed. It just can't > handle it without reconfiguring the switch to allow it, or simply throwing > it out. :-) > > What brand name and models are the switches? > > Glad you figured it out! > > Cheers! > > Ace > Yeah, that's the weird thing. The switch actually supports up to 6 multi-link trunks and will even do them across different switches. So it was well within the featureset of the switch. The odd thing is that it didn't just discard packets - it only discarded certain traffic and only on certain ports. That's why ping-tests and nslookups all worked but things like a zone transfer or a log-on wouldn't pass through (properly). It would have been way better if it dropped all packets on the effected ports instead of doing a 'soft-fail'. I think somehow the switch software got corrupted. Anyway, it's done now. The switch is an older Nortel/Bay Networks 420-24. It's getting time to change it out in favor of a gig Ethernet unit. Thanks for your help through all these different issues - it's been great to have someone else in the picture (other then just myself!) Best; Dave
From: Ace Fekay [MCT] on 27 Nov 2009 20:37
"Dave Onex" <dave(a)microsoft.com> wrote in message news:%23n2$%23L6bKHA.2188(a)TK2MSFTNGP04.phx.gbl... >>>> >>>> Hi Ace; >>>> >>>> I really don't think it's an ISA issue. I just re-built the ISA server >>>> from scratch and the log-in problem occurred well before ISA was >>>> installed. I did a clean O/S install and then went to Service Pack 4 >>>> and then joined the domain. As soon as I joined the domain - blammo - >>>> the long log-in times started happening. So, I was able to eliminate >>>> ISA from the loop right off the bat. >>>> >>>> I have been pulling my hair out. It took a long time to re-build the >>>> ISA machine (many, many long log-in's occurred after each re-start!). >>>> >>>> At this point I don't know what it is. It's a weird thing but it's >>>> true. One thing I just found out is that one of my secondary DNS >>>> servers was not able to pull a copy of a zone from the primary. The >>>> event viewer for the secondary DNS server complains that the primary >>>> did not send the zone and the logs on the primary report that it did >>>> send the zone. >>>> >>>> There's something weird going on here.... >>>> >>> >>> After many hours of messing around, staring at DNS entries across 4 >>> servers, doing a ground-up re-build of the firewall - I finally figured >>> out what was happened. >>> Are you ready? ................. >>> >>> The switch packed it in. More specifically, certain limited aspects of >>> the switch packed it in.... >>> >>> I'm sitting here at my wits end. Everything worked perfectly before I >>> added the extra NIC to the firewall. I configured multi-link trunks on >>> the appropriate switch ports. Somehow or another, the switch either >>> failed at that moment or the software that runs the switch got >>> corrupted. >>> >>> That's why weird things were happening. For instance, the zone transfer >>> from the primary DNS to the secondary. The transfer would begin and then >>> the secondary would report that the transfer failed telling me to go >>> look for clues at the primary. Looking at the primary showed that the >>> transfer succeeded. So where did the data go? Into the ether, I guess. >>> >>> In that case the primary DC did do the transfer. It was the secondary >>> that didn't get all the data because of... the switch. I turned on >>> logging on the secondary DNS server and it showed that the transfer was >>> taking place but that it had not completed. After waiting for some time >>> it would then cough up the error message. >>> >>> I happened to have another identical switch here so I tried a last ditch >>> effort and changed it out. Bingo - the speed increase across the network >>> was instant. Everything is responding instantly again. Log-on's are >>> instant. >>> >>> So, the reason the domain controllers had zero issues must have been >>> that the ports on the switch they were connected to were OK. The reason >>> the Proxy and the Mail server were having issues was because there was >>> some form of corruption in the switch for those ports. Data flowing >>> through those SPECIFIC ports was being lost or corrupted causing all >>> sorts of log-in problems and DNS transfers between servers. >>> >>> Go figure. >>> >>> I knew my DNS was perfect - the network was instant prior to installing >>> the extra NIC. Same with ISA - it's been in place for about 4 years now >>> and it's rock-solid. Thing is, because I couldn't figure out what was >>> causing this weird behavior I went looking all over the place only to >>> find out the switch was corrupted. >>> >>> The network performance was so poor it was as if the entire network was >>> infected with a virus. It was slow, 'jerky' and annoying. In retrospect >>> it was probably packets being shed intermittently in the switch itself. >>> >>> The interesting thing is that small zone transfers would work. It was >>> the larger zone (my active directory zone) that would not transfer. So >>> it's almost as if small stuff would get through and larger transfers >>> wouldn't. This meant that pings worked perfectly, nslookups worked >>> perfectly but any sizable transfers (such as probably occurs when >>> logging on) would fail. That's why I could read that little group policy >>> file from each of the affected computers - it was small enough. >>> >>> I don't know how a switch works (inside) but I know this much - the one >>> I had in place selectively failed on specific ports affecting those two >>> servers and the type of failure meant small traffic got through and >>> large traffic would not. That's why all my ICMP diagnostic traffic >>> succeeded and that's why I was pulling out my hair - there was no >>> apparent reason for the problems I was having to occur. If you can ping >>> all the machines and do forward and reverse lookups to them - it should >>> work! >>> >>> Hahaha - anyway, I just thought I would let you know what it ended up >>> being in the end. That's what I get for trying to _increase_ network >>> performance by adding another NIC to the proxy! >>> >>> Best & Thanks! >>> Dave >>> >> >> >> Wow. And I've seen this before with switches and teaming, but I just >> didn't think of it. Some switches by default, will do that when you >> connect the two NICs on the same switch, even if teamed. It just can't >> handle it without reconfiguring the switch to allow it, or simply >> throwing it out. :-) >> >> What brand name and models are the switches? >> >> Glad you figured it out! >> >> Cheers! >> >> Ace >> > > Yeah, that's the weird thing. The switch actually supports up to 6 > multi-link trunks and will even do them across different switches. So it > was well within the featureset of the switch. The odd thing is that it > didn't just discard packets - it only discarded certain traffic and only > on certain ports. That's why ping-tests and nslookups all worked but > things like a zone transfer or a log-on wouldn't pass through (properly). > > It would have been way better if it dropped all packets on the effected > ports instead of doing a 'soft-fail'. > > I think somehow the switch software got corrupted. Anyway, it's done now. > The switch is an older Nortel/Bay Networks 420-24. It's getting time to > change it out in favor of a gig Ethernet unit. > > Thanks for your help through all these different issues - it's been great > to have someone else in the picture (other then just myself!) > > Best; > Dave > Looks like you did well by yourself! :-) I like the Cisco Catalysts. Nice switches, no problems. Cheers! Ace |