From: Roedy Green on 31 Mar 2010 17:23 Everyone has seen that sending one big file works much more efficiently than many small files. The effect quite astounding, many orders of magnitude. It just occurred to me that I don't think I can account for that huge difference. Where is all the time going? It then occurred to me, that any sort of technique to reduce the difference could have a huge effect on the Internet as a whole. -- Roedy Green Canadian Mind Products http://mindprod.com If you tell a computer the same fact in more than one place, unless you have an automated mechanism to ensure they stay in sync, the versions of the fact will eventually get out of sync.
From: Arne Vajhøj on 31 Mar 2010 18:38 On 31-03-2010 17:23, Roedy Green wrote: > Everyone has seen that sending one big file works much more > efficiently than many small files. The effect quite astounding, many > orders of magnitude. It just occurred to me that I don't think I can > account for that huge difference. Where is all the time going? Given the lack of context, then one can only guess: - file open and file creation are rather expensive operations so many small files have huge overhead - per file protocol overhead - really small files can not be compressed as efficiently as larger files Arne
From: Tom Anderson on 31 Mar 2010 18:51 On Wed, 31 Mar 2010, Roedy Green wrote: > Everyone has seen that sending one big file works much more efficiently > than many small files. The effect quite astounding, many orders of > magnitude. It just occurred to me that I don't think I can account for > that huge difference. Where is all the time going? TCP handshake, TCP slow start (look that one up if you don't know it), roundtrips for control packets at the start of the connection. Losing a bit of time can have a huge impact on throughput - it's all about the bandwidth-delay product, which on today's long, fat networks is huge. > It then occurred to me, that any sort of technique to reduce the > difference could have a huge effect on the Internet as a whole. Yes. It's called pipelining, and it's been in HTTP since 1999. Although it's not that widely used by browsers, because of worries about compatibility with servers, which seems a bit of a waste. tom -- I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco
From: bugbear on 1 Apr 2010 03:46 Tom Anderson wrote: > On Wed, 31 Mar 2010, Roedy Green wrote: > >> Everyone has seen that sending one big file works much more >> efficiently than many small files. The effect quite astounding, many >> orders of magnitude. It just occurred to me that I don't think I can >> account for that huge difference. Where is all the time going? > > TCP handshake, TCP slow start (look that one up if you don't know it), > roundtrips for control packets at the start of the connection. Losing a > bit of time can have a huge impact on throughput - it's all about the > bandwidth-delay product, which on today's long, fat networks is huge. "back in the day" some serial protocols had asynchronous ACK and resend on packets. BugBear
From: Kevin McMurtrie on 1 Apr 2010 03:47 In article <alpine.DEB.1.10.1003312344430.13579(a)urchin.earth.li>, Tom Anderson <twic(a)urchin.earth.li> wrote: > On Wed, 31 Mar 2010, Roedy Green wrote: > > > Everyone has seen that sending one big file works much more efficiently > > than many small files. The effect quite astounding, many orders of > > magnitude. It just occurred to me that I don't think I can account for > > that huge difference. Where is all the time going? > > TCP handshake, TCP slow start (look that one up if you don't know it), > roundtrips for control packets at the start of the connection. Losing a > bit of time can have a huge impact on throughput - it's all about the > bandwidth-delay product, which on today's long, fat networks is huge. > > > It then occurred to me, that any sort of technique to reduce the > > difference could have a huge effect on the Internet as a whole. > > Yes. It's called pipelining, and it's been in HTTP since 1999. > > Although it's not that widely used by browsers, because of worries about > compatibility with servers, which seems a bit of a waste. > > tom Browsers don't support pipelining because the multiplexer/demultiplexer is too complicated for the average software engineer. Out-of-order response processing requires forcing preceding responses in the pipeline into memory. That's tricky, but not too bad. Now do that and rebuild the pipeline when the connection closes or drops. Ugly! At least in the old Innovation HTTPClient, that results in multiple lock grabs on components of linked list that's prone to failure. The code is convoluted and it's looking like a total rewrite might be easier. Last I heard, Microsoft and Apache clients can't pipeline; WebKit can but it's an experimental feature. -- I won't see Google Groups replies because I must filter them as spam
|
Next
|
Last
Pages: 1 2 Prev: using java to create a key-value database Next: Conditional SQL in Java |