From: victor Yankee on 15 Apr 2010 02:01 Would someone be able to point me in the right direction ? We are looking at implementing a socket server using TCP/IP socket streams on Solaris 10. One of the requirements is that we need to only send TCP ACK messages once our application has flushed the payload to non-volatile memory. We were looking at using the MSG_PEEK flag to read the data of the socket however this call does not stop the TCP stack from acking more packets until our RCVBUF is full. This means that if our application dies we could be loosing data that is in our RCVBUF. Do you know if there is any other method that we could use to do this or do we need to use RAW sockets and or modify the kernel TCP/IP stack ? cheers, Victor
From: Ian Collins on 15 Apr 2010 02:09 On 04/15/10 06:01 PM, victor Yankee wrote: > Would someone be able to point me in the right direction ? > > We are looking at implementing a socket server using TCP/IP socket > streams on Solaris 10. One of the requirements is that we need to > only send TCP ACK messages once our application has flushed the > payload to non-volatile memory. That sounds out of the scope of TCP and you should be using a higher layer protocol to send an acknowledge. What happens if more data is to be sent than the advertised size of the receive window? > We were looking at using the MSG_PEEK flag to read the data of the > socket however this call does not stop the TCP stack from acking more > packets until our RCVBUF is full. This means that if our application > dies we could be loosing data that is in our RCVBUF. > > Do you know if there is any other method that we could use to do this > or do we need to use RAW sockets and or modify the kernel TCP/IP > stack ? Your problem is similar to NFS and the solution is a protocol on top of TCP! -- Ian Collins
From: victor Yankee on 15 Apr 2010 02:58 On Apr 15, 4:09 pm, Ian Collins <ian-n...(a)hotmail.com> wrote: > On 04/15/10 06:01 PM, victor Yankee wrote: > > > Would someone be able to point me in the right direction ? > > > We are looking at implementing a socket server using TCP/IP socket > > streams on Solaris 10. One of the requirements is that we need to > > only send TCP ACK messages once our application has flushed the > > payload to non-volatile memory. > > That sounds out of the scope of TCP and you should be using a higher > layer protocol to send an acknowledge. > > What happens if more data is to be sent than the advertised size of the > receive window? > > > We were looking at using the MSG_PEEK flag to read the data of the > > socket however this call does not stop the TCP stack from acking more > > packets until our RCVBUF is full. This means that if our application > > dies we could be loosing data that is in our RCVBUF. > > > Do you know if there is any other method that we could use to do this > > or do we need to use RAW sockets and or modify the kernel TCP/IP > > stack ? > > Your problem is similar to NFS and the solution is a protocol on top of TCP! > > -- > Ian Collins Hi Ian, Unfortunately we are not able to use a higher level protocol to ack the messages since the client application is out of our control. The client just connects and streams the data to the socket. We are expected to only ack the packets once they have been cached to non- volatile memory. The only method for telling the client that we have cached the payload is through the TCP ack message. cheers, vic
From: Casper H.S. Dik on 15 Apr 2010 03:21 victor Yankee <vyankee1(a)gmail.com> writes: >Would someone be able to point me in the right direction ? >We are looking at implementing a socket server using TCP/IP socket >streams on Solaris 10. One of the requirements is that we need to >only send TCP ACK messages once our application has flushed the >payload to non-volatile memory. Not possible in the Solaris 10 implementation; I would be *very* surprised if any TCP implementation allows this. >We were looking at using the MSG_PEEK flag to read the data of the >socket however this call does not stop the TCP stack from acking more >packets until our RCVBUF is full. This means that if our application >dies we could be loosing data that is in our RCVBUF. Indeed. Using ACKs is the wrong mechanism. You will probably notice that with delayed acks as you want, the protocol will run 10-100 slower. >Do you know if there is any other method that we could use to do this >or do we need to use RAW sockets and or modify the kernel TCP/IP >stack ? You can't make this work with TCP/IP; you will need to change your protocol and send a message when the data is written. I.e., don't use just TCP but add a protocol on top of it. If you change your TCP/IP implementation, it won't be TCP/IP and other users of that (non) TCP/IP stack will likely broke. Casper -- Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. Statements on Sun products included here are not gospel and may be fiction rather than truth.
From: Ersek, Laszlo on 15 Apr 2010 06:16
On Wed, 14 Apr 2010, victor Yankee wrote: > On Apr 15, 4:09�pm, Ian Collins <ian-n...(a)hotmail.com> wrote: >> On 04/15/10 06:01 PM, victor Yankee wrote: >> >>> We are looking at implementing a socket server using TCP/IP socket >>> streams on Solaris 10. �One of the requirements is that we need to >>> only send TCP ACK messages once our application has flushed the >>> payload to non-volatile memory. >> >> That sounds out of the scope of TCP and you should be using a higher >> layer protocol to send an acknowledge. >> >> What happens if more data is to be sent than the advertised size of the >> receive window? >> > > Unfortunately we are not able to use a higher level protocol to ack the > messages since the client application is out of our control. The client > just connects and streams the data to the socket. We are expected to > only ack the packets once they have been cached to non- volatile memory. > The only method for telling the client that we have cached the payload > is through the TCP ack message. Some wild guessing: - Write a TUN/TAP driver (for Solaris?) so you can not just capture stuff on the ethernet / IP level, but you can make its propagation dependent on saving it first somewhere. I'm not sure if you'd need to try to reconstruct, on the fly, the exact byte stream seen by the server; it might suffice if you write a replay program for the dump format (which does the reassembly, reordering etc). - If Solaris 10 supports STREAMS based TCP/IP, try to push a module of your own creation between, well, TCP and IP; then see the previous paragraph. - Add a netfilter rule (if Solaris 10 supports anything like that) which queues the IP packet to a userspace program for approval only following storage. - Insert a Linux box in front of the Solaris box, and do the previous paragraph (iptables / NFQUEUE): make sure any TCP segment leaves for the Solaris box only after you've saved it somewhere. See <http://netfilter.org/projects/libnetfilter_queue/index.html>: ----v---- Main Features * receiving queued packets from the kernel nfnetlink_queue subsystem * issuing verdicts and/or reinjecting altered packets to the kernel nfnetlink_queue subsystem ----^---- Issue an ACCEPT verdict only after saving the packet. Cheers, lacos |