Argghhh... 30 Minute Log-in's :-( [Active Directory]

Prev: AD backup
Next: Windows 2008 Domain Name Rename

From: Dave Onex on 27 Nov 2009 15:48

>>>
>>> Hi Ace;
>>>
>>> I really don't think it's an ISA issue. I just re-built the ISA server
>>> from scratch and the log-in problem occurred well before ISA was
>>> installed. I did a clean O/S install and then went to Service Pack 4 and
>>> then joined the domain. As soon as I joined the domain - blammo - the
>>> long log-in times started happening. So, I was able to eliminate ISA
>>> from the loop right off the bat.
>>>
>>> I have been pulling my hair out. It took a long time to re-build the ISA
>>> machine (many, many long log-in's occurred after each re-start!).
>>>
>>> At this point I don't know what it is. It's a weird thing but it's true.
>>> One thing I just found out is that one of my secondary DNS servers was
>>> not able to pull a copy of a zone from the primary. The event viewer for
>>> the secondary DNS server complains that the primary did not send the
>>> zone and the logs on the primary report that it did send the zone.
>>>
>>> There's something weird going on here....
>>>
>>
>> After many hours of messing around, staring at DNS entries across 4
>> servers, doing a ground-up re-build of the firewall - I finally figured
>> out what was happened.
>> Are you ready? .................
>>
>> The switch packed it in. More specifically, certain limited aspects of
>> the switch packed it in....
>>
>> I'm sitting here at my wits end. Everything worked perfectly before I
>> added the extra NIC to the firewall. I configured multi-link trunks on
>> the appropriate switch ports. Somehow or another, the switch either
>> failed at that moment or the software that runs the switch got corrupted.
>>
>> That's why weird things were happening. For instance, the zone transfer
>> from the primary DNS to the secondary. The transfer would begin and then
>> the secondary would report that the transfer failed telling me to go look
>> for clues at the primary. Looking at the primary showed that the transfer
>> succeeded. So where did the data go? Into the ether, I guess.
>>
>> In that case the primary DC did do the transfer. It was the secondary
>> that didn't get all the data because of... the switch. I turned on
>> logging on the secondary DNS server and it showed that the transfer was
>> taking place but that it had not completed. After waiting for some time
>> it would then cough up the error message.
>>
>> I happened to have another identical switch here so I tried a last ditch
>> effort and changed it out. Bingo - the speed increase across the network
>> was instant. Everything is responding instantly again. Log-on's are
>> instant.
>>
>> So, the reason the domain controllers had zero issues must have been that
>> the ports on the switch they were connected to were OK. The reason the
>> Proxy and the Mail server were having issues was because there was some
>> form of corruption in the switch for those ports. Data flowing through
>> those SPECIFIC ports was being lost or corrupted causing all sorts of
>> log-in problems and DNS transfers between servers.
>>
>> Go figure.
>>
>> I knew my DNS was perfect - the network was instant prior to installing
>> the extra NIC. Same with ISA - it's been in place for about 4 years now
>> and it's rock-solid. Thing is, because I couldn't figure out what was
>> causing this weird behavior I went looking all over the place only to
>> find out the switch was corrupted.
>>
>> The network performance was so poor it was as if the entire network was
>> infected with a virus. It was slow, 'jerky' and annoying. In retrospect
>> it was probably packets being shed intermittently in the switch itself.
>>
>> The interesting thing is that small zone transfers would work. It was the
>> larger zone (my active directory zone) that would not transfer. So it's
>> almost as if small stuff would get through and larger transfers wouldn't.
>> This meant that pings worked perfectly, nslookups worked perfectly but
>> any sizable transfers (such as probably occurs when logging on) would
>> fail. That's why I could read that little group policy file from each of
>> the affected computers - it was small enough.
>>
>> I don't know how a switch works (inside) but I know this much - the one I
>> had in place selectively failed on specific ports affecting those two
>> servers and the type of failure meant small traffic got through and large
>> traffic would not. That's why all my ICMP diagnostic traffic succeeded
>> and that's why I was pulling out my hair - there was no apparent reason
>> for the problems I was having to occur. If you can ping all the machines
>> and do forward and reverse lookups to them - it should work!
>>
>> Hahaha - anyway, I just thought I would let you know what it ended up
>> being in the end. That's what I get for trying to _increase_ network
>> performance by adding another NIC to the proxy!
>>
>> Best & Thanks!
>> Dave
>>
>
>
> Wow. And I've seen this before with switches and teaming, but I just
> didn't think of it. Some switches by default, will do that when you
> connect the two NICs on the same switch, even if teamed. It just can't
> handle it without reconfiguring the switch to allow it, or simply throwing
> it out. :-)
>
> What brand name and models are the switches?
>
> Glad you figured it out!
>
> Cheers!
>
> Ace
>

Yeah, that's the weird thing. The switch actually supports up to 6
multi-link trunks and will even do them across different switches. So it was
well within the featureset of the switch. The odd thing is that it didn't
just discard packets - it only discarded certain traffic and only on certain
ports. That's why ping-tests and nslookups all worked but things like a zone
transfer or a log-on wouldn't pass through (properly).

It would have been way better if it dropped all packets on the effected
ports instead of doing a 'soft-fail'.

I think somehow the switch software got corrupted. Anyway, it's done now.
The switch is an older Nortel/Bay Networks 420-24. It's getting time to
change it out in favor of a gig Ethernet unit.

Thanks for your help through all these different issues - it's been great to
have someone else in the picture (other then just myself!)

Best;
Dave

From: Ace Fekay [MCT] on 27 Nov 2009 20:37

"Dave Onex" <dave(a)microsoft.com> wrote in message
news:%23n2$%23L6bKHA.2188(a)TK2MSFTNGP04.phx.gbl...
>>>>
>>>> Hi Ace;
>>>>
>>>> I really don't think it's an ISA issue. I just re-built the ISA server
>>>> from scratch and the log-in problem occurred well before ISA was
>>>> installed. I did a clean O/S install and then went to Service Pack 4
>>>> and then joined the domain. As soon as I joined the domain - blammo -
>>>> the long log-in times started happening. So, I was able to eliminate
>>>> ISA from the loop right off the bat.
>>>>
>>>> I have been pulling my hair out. It took a long time to re-build the
>>>> ISA machine (many, many long log-in's occurred after each re-start!).
>>>>
>>>> At this point I don't know what it is. It's a weird thing but it's
>>>> true. One thing I just found out is that one of my secondary DNS
>>>> servers was not able to pull a copy of a zone from the primary. The
>>>> event viewer for the secondary DNS server complains that the primary
>>>> did not send the zone and the logs on the primary report that it did
>>>> send the zone.
>>>>
>>>> There's something weird going on here....
>>>>
>>>
>>> After many hours of messing around, staring at DNS entries across 4
>>> servers, doing a ground-up re-build of the firewall - I finally figured
>>> out what was happened.
>>> Are you ready? .................
>>>
>>> The switch packed it in. More specifically, certain limited aspects of
>>> the switch packed it in....
>>>
>>> I'm sitting here at my wits end. Everything worked perfectly before I
>>> added the extra NIC to the firewall. I configured multi-link trunks on
>>> the appropriate switch ports. Somehow or another, the switch either
>>> failed at that moment or the software that runs the switch got
>>> corrupted.
>>>
>>> That's why weird things were happening. For instance, the zone transfer
>>> from the primary DNS to the secondary. The transfer would begin and then
>>> the secondary would report that the transfer failed telling me to go
>>> look for clues at the primary. Looking at the primary showed that the
>>> transfer succeeded. So where did the data go? Into the ether, I guess.
>>>
>>> In that case the primary DC did do the transfer. It was the secondary
>>> that didn't get all the data because of... the switch. I turned on
>>> logging on the secondary DNS server and it showed that the transfer was
>>> taking place but that it had not completed. After waiting for some time
>>> it would then cough up the error message.
>>>
>>> I happened to have another identical switch here so I tried a last ditch
>>> effort and changed it out. Bingo - the speed increase across the network
>>> was instant. Everything is responding instantly again. Log-on's are
>>> instant.
>>>
>>> So, the reason the domain controllers had zero issues must have been
>>> that the ports on the switch they were connected to were OK. The reason
>>> the Proxy and the Mail server were having issues was because there was
>>> some form of corruption in the switch for those ports. Data flowing
>>> through those SPECIFIC ports was being lost or corrupted causing all
>>> sorts of log-in problems and DNS transfers between servers.
>>>
>>> Go figure.
>>>
>>> I knew my DNS was perfect - the network was instant prior to installing
>>> the extra NIC. Same with ISA - it's been in place for about 4 years now
>>> and it's rock-solid. Thing is, because I couldn't figure out what was
>>> causing this weird behavior I went looking all over the place only to
>>> find out the switch was corrupted.
>>>
>>> The network performance was so poor it was as if the entire network was
>>> infected with a virus. It was slow, 'jerky' and annoying. In retrospect
>>> it was probably packets being shed intermittently in the switch itself.
>>>
>>> The interesting thing is that small zone transfers would work. It was
>>> the larger zone (my active directory zone) that would not transfer. So
>>> it's almost as if small stuff would get through and larger transfers
>>> wouldn't. This meant that pings worked perfectly, nslookups worked
>>> perfectly but any sizable transfers (such as probably occurs when
>>> logging on) would fail. That's why I could read that little group policy
>>> file from each of the affected computers - it was small enough.
>>>
>>> I don't know how a switch works (inside) but I know this much - the one
>>> I had in place selectively failed on specific ports affecting those two
>>> servers and the type of failure meant small traffic got through and
>>> large traffic would not. That's why all my ICMP diagnostic traffic
>>> succeeded and that's why I was pulling out my hair - there was no
>>> apparent reason for the problems I was having to occur. If you can ping
>>> all the machines and do forward and reverse lookups to them - it should
>>> work!
>>>
>>> Hahaha - anyway, I just thought I would let you know what it ended up
>>> being in the end. That's what I get for trying to _increase_ network
>>> performance by adding another NIC to the proxy!
>>>
>>> Best & Thanks!
>>> Dave
>>>
>>
>>
>> Wow. And I've seen this before with switches and teaming, but I just
>> didn't think of it. Some switches by default, will do that when you
>> connect the two NICs on the same switch, even if teamed. It just can't
>> handle it without reconfiguring the switch to allow it, or simply
>> throwing it out. :-)
>>
>> What brand name and models are the switches?
>>
>> Glad you figured it out!
>>
>> Cheers!
>>
>> Ace
>>
>
> Yeah, that's the weird thing. The switch actually supports up to 6
> multi-link trunks and will even do them across different switches. So it
> was well within the featureset of the switch. The odd thing is that it
> didn't just discard packets - it only discarded certain traffic and only
> on certain ports. That's why ping-tests and nslookups all worked but
> things like a zone transfer or a log-on wouldn't pass through (properly).
>
> It would have been way better if it dropped all packets on the effected
> ports instead of doing a 'soft-fail'.
>
> I think somehow the switch software got corrupted. Anyway, it's done now.
> The switch is an older Nortel/Bay Networks 420-24. It's getting time to
> change it out in favor of a gig Ethernet unit.
>
> Thanks for your help through all these different issues - it's been great
> to have someone else in the picture (other then just myself!)
>
> Best;
> Dave
>

Looks like you did well by yourself! :-)

I like the Cisco Catalysts. Nice switches, no problems.

Cheers!

Ace

First | Prev |
Pages: 1 2 3
Prev: AD backup
Next: Windows 2008 Domain Name Rename