From: Peter Duniho on 12 Dec 2009 20:07 Michael Ober wrote: > The code below appears to work, but it eventually freezes and I have to > reboot the server. Killing and restarting the program itself doesn't > work as the listener socket is still bound. For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll be able to restart the server. While it's possible to do so, you don't want to bypass the CLOSE_WAIT, because there is theoretically the possibility of network messages still in transit that, if received after you've restarted the server, could corrupt the state of your server. > I have put debugging > statements in to verify that the ThreadPool threads are being released > and the most thread pool threads that this code used was 3, even under > heavy load. The problem appears to be when there are no clients for an > extended period and then a client attempts to connect. At that point, > the application freezes. When you break in the debugger, where are all the threads? What are they doing? > I have left out the BankInfo object which > contains the bank name, address, phone number, and routing number as > well as Library Routines such as WriteLog and SMTPMail.<method>. The > library routines have been in use in other applications for several > years now and are thoroughly debugged. Unfortunately, your code example is far from concise, nor is it complete (in particular, lacking the entire other half of the network connection). Without any way for anyone else to run your program and see it fail, there's no way to know for sure what's wrong. That said, here are some highlights of what appear to me to be problem areas. Some may even be related to the problem you're seeing: -- Use of "On Error Resume Next". Don't do this. You are basically just ignoring exceptions. Bad. Even in VB.NET, it's bad. -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld. It's bad enough that, contrary to the documentation's instructions, you are using the property to control program flow. But to do it in a loop? -- Unsynchronized access to the _ReceiveResults and _SendResults members. You should at a minimum be using Thread.VolatileRead() and ..VolatileWrite(). Even better would be to just use SyncLock or similar. -- That said, the mere presence of _ReceiveResults and _SendResults is a red flag to me. It's possible that with a purely transactional connection, this could work, and maybe that's what you're dealing with. But it's bad form in any case, and if you have _any_ possibility of overlapping send or receive operations, you've got big trouble. This design is a maintenance problem, and potentially a serious one. -- The code potentially raises a ConnectionClosed event for a connection for which a NewConnection event was never raised. If nothing else, this is IMHO poor design. If you fix the blank-check error handling, it will likely be a real problem. There may be other problems I didn't notice. Between the fact that I hardly ever use VB.NET and the fact that the code example is not a concise-but-complete one, I admit there may be things I overlooked. But, at the very least, there are some things in the code that I consider to be serious problems, and it's possible that fixing those aspects of the code will make the problem go away (I'm particularly concerned about the lack of synchronized access to fields that are used to coordinate between threads and operations). Pete
From: Michael Ober on 12 Dec 2009 22:07 "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message news:OnE9wC5eKHA.1596(a)TK2MSFTNGP06.phx.gbl... > Michael Ober wrote: >> The code below appears to work, but it eventually freezes and I have to >> reboot the server. Killing and restarting the program itself doesn't >> work as the listener socket is still bound. > > For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll > be able to restart the server. While it's possible to do so, you don't > want to bypass the CLOSE_WAIT, because there is theoretically the > possibility of network messages still in transit that, if received after > you've restarted the server, could corrupt the state of your server. The IPServer class is actually used in two different server applications. In this particular instance, which runs on Windows Server 2003 R2, I can indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other application, which runs on Windows XP, it appears that XP never closes the listening socket. I suspect the XP behavior has more to do with another server that must run on that particular box for which I have no source code than it does with XP itself. > >> I have put debugging statements in to verify that the ThreadPool threads >> are being released and the most thread pool threads that this code used >> was 3, even under heavy load. The problem appears to be when there are >> no clients for an extended period and then a client attempts to connect. >> At that point, the application freezes. > > When you break in the debugger, where are all the threads? What are they > doing? > Monday I'll modify our network to use my workstation as the server for this particular application and I'll start a VS 2008 session. On Tuesday I'll be able to see what happens. In the meantime I'm hoping for feedback on the code to try to correct the problems it has. What's really strange is that this application had been running fine for several months until I had to rebuild the server after a disk crash. >> I have left out the BankInfo object which contains the bank name, >> address, phone number, and routing number as well as Library Routines >> such as WriteLog and SMTPMail.<method>. The library routines have been >> in use in other applications for several years now and are thoroughly >> debugged. > > Unfortunately, your code example is far from concise, nor is it complete > (in particular, lacking the entire other half of the network connection). > Without any way for anyone else to run your program and see it fail, > there's no way to know for sure what's wrong. The other end is a currently a standard TCP client. It sends a request and waits for the response. The client applications do have to be coded so that if the server timeout expires they detect the dropped connection and reconnect if they need to send again. > > That said, here are some highlights of what appear to me to be problem > areas. Some may even be related to the problem you're seeing: > > -- Use of "On Error Resume Next". Don't do this. You are basically > just ignoring exceptions. Bad. Even in VB.NET, it's bad. > I only use this where I don't care about the error but do want each instruction to attempt execution. Is there a clean way of handling this with try ... catch ... end try? In this particular case, removing the On Error Resume Next statement was easily handled by checking for Nothing (null) and collection.Contains() before doing the work. > -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld. > It's bad enough that, contrary to the documentation's instructions, you > are using the property to control program flow. But to do it in a loop? After static analisys of the code, a finally clause on each of the two try blocks was sufficient to handle this. Since OnReceive would be called in different threads and not be reentrant, the loops can safely be removed. The purpose of this was to ensure I didn't have a ReaderWriterLockSlim causing a connection to stall. > > -- Unsynchronized access to the _ReceiveResults and _SendResults > members. You should at a minimum be using Thread.VolatileRead() and > .VolatileWrite(). Even better would be to just use SyncLock or similar. > > -- That said, the mere presence of _ReceiveResults and _SendResults is > a red flag to me. It's possible that with a purely transactional > connection, this could work, and maybe that's what you're dealing with. > But it's bad form in any case, and if you have _any_ possibility of > overlapping send or receive operations, you've got big trouble. This > design is a maintenance problem, and potentially a serious one. > Do you have a suggestion for a better design? I'm trying to keep this as asynchronous as possible. The design concept was that if the routine MustOverride routine has to do a lot of work and send back large numbers of results, it can call the underlying send method multiple times before returning, although looking at the code, I don't provide that particular routine the ability to access the Send method. Like I said in my original post, this program seems to lock up when there are long periods (measured in hours) of no clients. If access to these two variables was a problem, I would expect problems to show up during heavy usage, not during idle periods. This server class is also used in another application running on an XP SP3 system that is under significantly heavier load with more multithreading going on than this particular application. > -- The code potentially raises a ConnectionClosed event for a > connection for which a NewConnection event was never raised. If nothing > else, this is IMHO poor design. If you fix the blank-check error > handling, it will likely be a real problem. > In my logs, I have never seen the exception handler for the OnAccept(ar as IAsyncResults) subroutine called. I understand your comment, however. The RaiseEvent ConnectionClosed event was actually added as part of the troubleshooting so removing it isn't an issue. > There may be other problems I didn't notice. Between the fact that I > hardly ever use VB.NET and the fact that the code example is not a > concise-but-complete one, I admit there may be things I overlooked. But, > at the very least, there are some things in the code that I consider to be > serious problems, and it's possible that fixing those aspects of the code > will make the problem go away (I'm particularly concerned about the lack > of synchronized access to fields that are used to coordinate between > threads and operations). > > Pete Pete, I definitely thank you for the feedback. Mike.
From: Tom Shelton on 12 Dec 2009 22:20 On 2009-12-13, Michael Ober <obermd> wrote: > > "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message > news:OnE9wC5eKHA.1596(a)TK2MSFTNGP06.phx.gbl... >> Michael Ober wrote: >>> The code below appears to work, but it eventually freezes and I have to >>> reboot the server. Killing and restarting the program itself doesn't >>> work as the listener socket is still bound. >> >> For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, you'll >> be able to restart the server. While it's possible to do so, you don't >> want to bypass the CLOSE_WAIT, because there is theoretically the >> possibility of network messages still in transit that, if received after >> you've restarted the server, could corrupt the state of your server. > > The IPServer class is actually used in two different server applications. > In this particular instance, which runs on Windows Server 2003 R2, I can > indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other > application, which runs on Windows XP, it appears that XP never closes the > listening socket. I suspect the XP behavior has more to do with another > server that must run on that particular box for which I have no source code > than it does with XP itself. > >> >>> I have put debugging statements in to verify that the ThreadPool threads >>> are being released and the most thread pool threads that this code used >>> was 3, even under heavy load. The problem appears to be when there are >>> no clients for an extended period and then a client attempts to connect. >>> At that point, the application freezes. >> >> When you break in the debugger, where are all the threads? What are they >> doing? >> > > Monday I'll modify our network to use my workstation as the server for this > particular application and I'll start a VS 2008 session. On Tuesday I'll be > able to see what happens. In the meantime I'm hoping for feedback on the > code to try to correct the problems it has. What's really strange is that > this application had been running fine for several months until I had to > rebuild the server after a disk crash. > >>> I have left out the BankInfo object which contains the bank name, >>> address, phone number, and routing number as well as Library Routines >>> such as WriteLog and SMTPMail.<method>. The library routines have been >>> in use in other applications for several years now and are thoroughly >>> debugged. >> >> Unfortunately, your code example is far from concise, nor is it complete >> (in particular, lacking the entire other half of the network connection). >> Without any way for anyone else to run your program and see it fail, >> there's no way to know for sure what's wrong. > > The other end is a currently a standard TCP client. It sends a request and > waits for the response. The client applications do have to be coded so > that if the server timeout expires they detect the dropped connection and > reconnect if they need to send again. > >> >> That said, here are some highlights of what appear to me to be problem >> areas. Some may even be related to the problem you're seeing: >> >> -- Use of "On Error Resume Next". Don't do this. You are basically >> just ignoring exceptions. Bad. Even in VB.NET, it's bad. >> > > I only use this where I don't care about the error but do want each > instruction to attempt execution. Is there a clean way of handling this > with try ... catch ... end try? In this particular case, removing the On > Error Resume Next statement was easily handled by checking for Nothing > (null) and collection.Contains() before doing the work. > >> -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld. >> It's bad enough that, contrary to the documentation's instructions, you >> are using the property to control program flow. But to do it in a loop? > > After static analisys of the code, a finally clause on each of the two try > blocks was sufficient to handle this. Since OnReceive would be called in > different threads and not be reentrant, the loops can safely be removed. > The purpose of this was to ensure I didn't have a ReaderWriterLockSlim > causing a connection to stall. > >> >> -- Unsynchronized access to the _ReceiveResults and _SendResults >> members. You should at a minimum be using Thread.VolatileRead() and >> .VolatileWrite(). Even better would be to just use SyncLock or similar. >> >> -- That said, the mere presence of _ReceiveResults and _SendResults is >> a red flag to me. It's possible that with a purely transactional >> connection, this could work, and maybe that's what you're dealing with. >> But it's bad form in any case, and if you have _any_ possibility of >> overlapping send or receive operations, you've got big trouble. This >> design is a maintenance problem, and potentially a serious one. >> > > Do you have a suggestion for a better design? I'm trying to keep this as > asynchronous as possible. The design concept was that if the routine > MustOverride routine has to do a lot of work and send back large numbers of > results, it can call the underlying send method multiple times before > returning, although looking at the code, I don't provide that particular > routine the ability to access the Send method. > > Like I said in my original post, this program seems to lock up when there > are long periods (measured in hours) of no clients. If access to these two > variables was a problem, I would expect problems to show up during heavy > usage, not during idle periods. This server class is also used in another > application running on an XP SP3 system that is under significantly heavier > load with more multithreading going on than this particular application. > >> -- The code potentially raises a ConnectionClosed event for a >> connection for which a NewConnection event was never raised. If nothing >> else, this is IMHO poor design. If you fix the blank-check error >> handling, it will likely be a real problem. >> > > In my logs, I have never seen the exception handler for the OnAccept(ar as > IAsyncResults) subroutine called. I understand your comment, however. The > RaiseEvent ConnectionClosed event was actually added as part of the > troubleshooting so removing it isn't an issue. > >> There may be other problems I didn't notice. Between the fact that I >> hardly ever use VB.NET and the fact that the code example is not a >> concise-but-complete one, I admit there may be things I overlooked. But, >> at the very least, there are some things in the code that I consider to be >> serious problems, and it's possible that fixing those aspects of the code >> will make the problem go away (I'm particularly concerned about the lack >> of synchronized access to fields that are used to coordinate between >> threads and operations). >> >> Pete > > Pete, I definitely thank you for the feedback. > > Mike. > Mike... You mention that it seems to happen after long periods of inactivity. I remember at a former employer having a similar issue with a server - it had to do with the network card drivers power settings. Basically, the card was being shutdown and not waking up properly when a connection was made.... Just a thought :) -- Tom Shelton
From: Michael Ober on 12 Dec 2009 23:22 "Tom Shelton" <tom_shelton(a)comcastXXXXXXX.net> wrote in message news:%23pu7DN6eKHA.4112(a)TK2MSFTNGP06.phx.gbl... > On 2009-12-13, Michael Ober <obermd> wrote: >> >> "Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message >> news:OnE9wC5eKHA.1596(a)TK2MSFTNGP06.phx.gbl... >>> Michael Ober wrote: >>>> The code below appears to work, but it eventually freezes and I have to >>>> reboot the server. Killing and restarting the program itself doesn't >>>> work as the listener socket is still bound. >>> >>> For what it's worth, if you wait for the TCP CLOSE_WAIT to finish, >>> you'll >>> be able to restart the server. While it's possible to do so, you don't >>> want to bypass the CLOSE_WAIT, because there is theoretically the >>> possibility of network messages still in transit that, if received after >>> you've restarted the server, could corrupt the state of your server. >> >> The IPServer class is actually used in two different server applications. >> In this particular instance, which runs on Windows Server 2003 R2, I can >> indeed wait for the TCP CLOSE_WAIT - it's faster to reboot. In the other >> application, which runs on Windows XP, it appears that XP never closes >> the >> listening socket. I suspect the XP behavior has more to do with another >> server that must run on that particular box for which I have no source >> code >> than it does with XP itself. >> >>> >>>> I have put debugging statements in to verify that the ThreadPool >>>> threads >>>> are being released and the most thread pool threads that this code used >>>> was 3, even under heavy load. The problem appears to be when there are >>>> no clients for an extended period and then a client attempts to >>>> connect. >>>> At that point, the application freezes. >>> >>> When you break in the debugger, where are all the threads? What are >>> they >>> doing? >>> >> >> Monday I'll modify our network to use my workstation as the server for >> this >> particular application and I'll start a VS 2008 session. On Tuesday I'll >> be >> able to see what happens. In the meantime I'm hoping for feedback on the >> code to try to correct the problems it has. What's really strange is >> that >> this application had been running fine for several months until I had to >> rebuild the server after a disk crash. >> >>>> I have left out the BankInfo object which contains the bank name, >>>> address, phone number, and routing number as well as Library Routines >>>> such as WriteLog and SMTPMail.<method>. The library routines have been >>>> in use in other applications for several years now and are thoroughly >>>> debugged. >>> >>> Unfortunately, your code example is far from concise, nor is it complete >>> (in particular, lacking the entire other half of the network >>> connection). >>> Without any way for anyone else to run your program and see it fail, >>> there's no way to know for sure what's wrong. >> >> The other end is a currently a standard TCP client. It sends a request >> and >> waits for the response. The client applications do have to be coded so >> that if the server timeout expires they detect the dropped connection and >> reconnect if they need to send again. >> >>> >>> That said, here are some highlights of what appear to me to be problem >>> areas. Some may even be related to the problem you're seeing: >>> >>> -- Use of "On Error Resume Next". Don't do this. You are basically >>> just ignoring exceptions. Bad. Even in VB.NET, it's bad. >>> >> >> I only use this where I don't care about the error but do want each >> instruction to attempt execution. Is there a clean way of handling this >> with try ... catch ... end try? In this particular case, removing the On >> Error Resume Next statement was easily handled by checking for Nothing >> (null) and collection.Contains() before doing the work. >> >>> -- Use of ReaderWriterLockSlim.IsReadLockHeld and .IsWriteLockHeld. >>> It's bad enough that, contrary to the documentation's instructions, you >>> are using the property to control program flow. But to do it in a loop? >> >> After static analisys of the code, a finally clause on each of the two >> try >> blocks was sufficient to handle this. Since OnReceive would be called in >> different threads and not be reentrant, the loops can safely be removed. >> The purpose of this was to ensure I didn't have a ReaderWriterLockSlim >> causing a connection to stall. >> >>> >>> -- Unsynchronized access to the _ReceiveResults and _SendResults >>> members. You should at a minimum be using Thread.VolatileRead() and >>> .VolatileWrite(). Even better would be to just use SyncLock or similar. >>> >>> -- That said, the mere presence of _ReceiveResults and _SendResults >>> is >>> a red flag to me. It's possible that with a purely transactional >>> connection, this could work, and maybe that's what you're dealing with. >>> But it's bad form in any case, and if you have _any_ possibility of >>> overlapping send or receive operations, you've got big trouble. This >>> design is a maintenance problem, and potentially a serious one. >>> >> >> Do you have a suggestion for a better design? I'm trying to keep this as >> asynchronous as possible. The design concept was that if the routine >> MustOverride routine has to do a lot of work and send back large numbers >> of >> results, it can call the underlying send method multiple times before >> returning, although looking at the code, I don't provide that particular >> routine the ability to access the Send method. >> >> Like I said in my original post, this program seems to lock up when there >> are long periods (measured in hours) of no clients. If access to these >> two >> variables was a problem, I would expect problems to show up during heavy >> usage, not during idle periods. This server class is also used in >> another >> application running on an XP SP3 system that is under significantly >> heavier >> load with more multithreading going on than this particular application. >> >>> -- The code potentially raises a ConnectionClosed event for a >>> connection for which a NewConnection event was never raised. If nothing >>> else, this is IMHO poor design. If you fix the blank-check error >>> handling, it will likely be a real problem. >>> >> >> In my logs, I have never seen the exception handler for the OnAccept(ar >> as >> IAsyncResults) subroutine called. I understand your comment, however. >> The >> RaiseEvent ConnectionClosed event was actually added as part of the >> troubleshooting so removing it isn't an issue. >> >>> There may be other problems I didn't notice. Between the fact that I >>> hardly ever use VB.NET and the fact that the code example is not a >>> concise-but-complete one, I admit there may be things I overlooked. But, >>> at the very least, there are some things in the code that I consider to >>> be >>> serious problems, and it's possible that fixing those aspects of the >>> code >>> will make the problem go away (I'm particularly concerned about the lack >>> of synchronized access to fields that are used to coordinate between >>> threads and operations). >>> >>> Pete >> >> Pete, I definitely thank you for the feedback. >> >> Mike. >> > > Mike... You mention that it seems to happen after long periods of > inactivity. I remember at a former employer having a similar issue with a > server - it had to do with the network card drivers power settings. > Basically, the card was being shutdown and not waking up properly when a > connection was made.... Just a thought :) > > -- > Tom Shelton It's a VMWare guest. I just checked the guest OS from home and I'll check the host on Monday. Thanks for the idea. Mike.
From: Michael Ober on 13 Dec 2009 00:20
Let me clarify - in one post I said it was a disk crash and here I say it's a VMWare guest. Both statements are true. We had a series of power hits close enough together that our UPS systems couldn't get clean shutdowns on the final power hit. The result was that the host file system was corrupted which led to the VMWare files that comprise the guest OS filesystem being damaged. The end result was that we lost two of our virtual servers to what were essentially disk crashes. I had to rebuild and then restore services from tape both servers. Fortunately the actual disks weren't physically damaged - just the logical file system on the disk was damaged. Needless to say it wasn't a fun weekend to recover. The application I posted the code from has been running since last winter with no problems. I recompiled and released and have been having problems since then. > > It's a VMWare guest. I just checked the guest OS from home and I'll check > the host on Monday. Thanks for the idea. > > Mike. |