From: David Schwartz on 15 Jan 2008 11:03 On Jan 15, 6:50 am, Arkadiy <vertl...(a)gmail.com> wrote: > Why do I have to use a 'VERSION' command to determine that the > connection is dead rather than determine this based on the actual > request? Because you've already sent the request, so it's too late for it to trigger a RST. Your side needs to be in the process of attempting to send data in order to generate a packet that could trigger a RST. > > It seems like if the server is unreachable, nothing you try will > > succeed anyway, so it doesn't matter particularly much what you do. > > Does this mean > > 1) nothing sent will produce a response, so anything can be send to > detect the condition, or > 2) there is nothing we can do in the code to handle the situation? Why does it matter what you do in this case? The server is unreachable. Nothing will work. It doesn't matter what you do or try to do to the connection. There will be no response, so you will have no response to give. You can give up waiting whenever you want, but with respect to the connection, what you send, and what you detect, it doesn't matter. DS
From: Arkadiy on 15 Jan 2008 11:59 On Jan 15, 11:03 am, David Schwartz <dav...(a)webmaster.com> wrote: > Why does it matter what you do in this case? The server is > unreachable. Nothing will work. It doesn't matter what you do or try > to do to the connection. There will be no response, so you will have > no response to give. You can give up waiting whenever you want, but > with respect to the connection, what you send, and what you detect, it > doesn't matter. It does for me. I want to start attempts to reconnect, in a different thread, through the configurable interval of time. Once the server stops being unreachable, the connect will succeed, and I will have connections in my pool, ready to use. Until then, all my requests immediately return failure since there is no available connections. But, once the server stops being unreachable, the requests start succeeding. I think this model allows me to achieve the goal of "fast result or no result". The notion of "fast" is configurable by the timeout value. Let's say I setup all the facility on the LAN, and the server's average response time is 0.1 msec. If I set the timeout to, for example, 1 msec, everything will go smoothly most of the time. When I get a timeout, this may mean one of a few things: 1) The server became unreachable. I want to close the socket and start attempts to reconnect; 2) Accidentially long response. Still nothing wrong with reconnecting, since this happens rearly (how rarely -- can be controlled by the value of the timeout); 3) Server is congested. This is the worst case. But, IMO, this is the case where nothing can be done. Except adding another server instance to split the load. 4) Network is congested. Again, I don't see what can be done in this case. So it seems to me that reconnect works OK for both cases where anything can be done -- I just need to setup reasonably large timeout -- sufficiently larger than the average response time under normal conditions. For two other cases, it seems yes, I am adding a bit to already existing mess. But does this really matter? Am I missing something? Regards, Arkadiy
From: David Schwartz on 15 Jan 2008 12:10 On Jan 15, 8:59 am, Arkadiy <vertl...(a)gmail.com> wrote: > Let's say I setup all the facility on the LAN, and the server's > average response time is 0.1 msec. If I set the timeout to, for > example, 1 msec, everything will go smoothly most of the time. When I > get a timeout, this may mean one of a few things: > > 1) The server became unreachable. I want to close the socket and > start attempts to reconnect; > > 2) Accidentially long response. Still nothing wrong with > reconnecting, since this happens rearly (how rarely -- can be > controlled by the value of the timeout); > > 3) Server is congested. This is the worst case. But, IMO, this is > the case where nothing can be done. Except adding another server > instance to split the load. You can avoid adding to the server load so that it has a hope of catching up. > 4) Network is congested. Again, I don't see what can be done in this > case. You can avoid adding to network congestion so that it has a hope of abating. > So it seems to me that reconnect works OK for both cases where > anything can be done -- I just need to setup reasonably large timeout > -- sufficiently larger than the average response time under normal > conditions. For two other cases, it seems yes, I am adding a bit to > already existing mess. But does this really matter? It won't matter too much as long as you keep the rate under control and the number of connections under control. You won't cause much trouble because TCP has its own rate-limiting to protect the network. You may cause the server some pain because of the rate of connection establishment and teardown, but it shouldn't be horribly bad. You really want to backoff and retry though. And I'm not sure you want to tear down the connection at the first sign of trouble. You can also use connection establishment to verify that the server is operational. If you can set up a new connection to it, it's not dead. However, sending a 'version' command will have (approximately) the same effect. Make sure your write aggregation/buffering is sufficient to ensure that Nagle doesn't bit you. DS
From: Rainer Weikusat on 15 Jan 2008 12:47 Arkadiy <vertleyb(a)gmail.com> writes: > On Jan 15, 11:03 am, David Schwartz <dav...(a)webmaster.com> wrote: [...] > I think this model allows me to achieve the goal of "fast result or no > result". The notion of "fast" is configurable by the timeout value. > > Let's say I setup all the facility on the LAN, and the server's > average response time is 0.1 msec. If I set the timeout to, for > example, 1 msec, everything will go smoothly most of the time. When I > get a timeout, this may mean one of a few things: > > 1) The server became unreachable. I want to close the socket and > start attempts to reconnect; This will result in a FIN being transmitted to the server, after which the kernel waits for a FIN-ACK coming from there and sends an ACK. If the server is unreachable, this will not work, and if it is just responding slowly, your close request will be processed after all other requests you have sent and the connection will be closed after you have processed all replies to these requests (this is the default behaviour, hackarounds would be possible). > 2) Accidentially long response. Still nothing wrong with > reconnecting, since this happens rearly (how rarely -- can be > controlled by the value of the timeout); This is actually situation 3: The server cannot reply fast enough. > 3) Server is congested. This is the worst case. But, IMO, this is > the case where nothing can be done. Except adding another server > instance to split the load. The easy thing which could be done is to not increase the load in the server by torturing it with completely useless connection teardown and re-establishment requests. > 4) Network is congested. Again, I don't see what can be done in this > case. Same as above. Avoid injecting more useless packets. To repeat this again: TCP is a reliable bytestream protocol based on persistent virtual circuits. Either you want a reliable bytestream protocol, then you should just be using it, or you don't want a reliable bytestream protocol and then don't use it. For practical purposes, the daemon you are talking to is closely similar to a network file system server and the way you want to use it would lend itself to the 'traditional' way NFS worked: Assume the server is stateless (gets around all possible issues with crashes etc), send a request using UDP when you want to request some data, possibly, retransmit that a couple of times using a 'standard' exponential backoff algorithm and process eventual replies when and if they arrive, dropping whatever you don't want anymore. This is exactly the same procedure one would sensibly use for TCP, except that it is not necessary to deal with connections anymore. [...] > Am I missing something? An introduction in internetworking protocols, maybe?
From: Arkadiy on 15 Jan 2008 13:15
On Jan 15, 12:10 pm, David Schwartz <dav...(a)webmaster.com> wrote: > You really want to backoff and retry though. And I'm not sure you want > to tear down the connection at the first sign of trouble. My problem is I can't understand the purpose of this "retry". If my timeout is 1 sec, and the first request timed out, and I retry it, why not to set the timeout to 2 sec in the first place? Regards, Arkadiy |