Prev: [HACKERS] About tapes
Next: beta3 & the open items list
From: Robert Haas on 18 Jun 2010 15:00 On Fri, Jun 18, 2010 at 2:36 PM, mac_man2005(a)hotmail.it <mac_man2005(a)hotmail.it> wrote: > Please take a look at the initial comment contained into the logtape.c file: > http://doxygen.postgresql.org/logtape_8c-source.html > > Almost at the beginning of that file, it is affirmed that implementing tapes > on disk (quote: by creating a separate file for each "tape") will require > more space than implementing merge on tapes themselves. > Now, taking in account that tuplesort.c and logtape.c actually DO implement > tapes on disk, in which case it would require between 2x and 4x the input > space? Did you read the rest of the comment? It explains how the code avoids this... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "mac_man2005 on 19 Jun 2010 04:57 Tom, Robert, thank you. Now it is clearer how space on tapes is recycled. I tried to follow Robert's example but storing one tape per separate file. Read in the first block of each run (hosted by separate tapes and so by separate files) and output them into extra storage, wherever this extra storage is. Again, those first input blocks are now garbage: is it correct? In this case, what happens when trying to recycle those garbage blocks by hosting the result of merging the second block of each run? Il 18/06/2010 23:29, Robert Haas ha scritto: > On Fri, Jun 18, 2010 at 3:46 PM, mac_man2005(a)hotmail.it > <mac_man2005(a)hotmail.it> wrote: > >> Which is the difference between having more than one tape into a file and >> having one tape per file? >> > It makes it easier to recycle space a little at a time. Suppose > you're merging two runs of 100 blocks each. You read in a block from > each run and write out two output blocks. Now that you've done that, > the first block of each of the input runs is garbage and can be > recycled - but if the input runs and the output run are in three > separate files, there's no easy way to do that. You can truncate a > file (and throw away the end) but there's no easy way to throw away > the BEGINNING of a file. So you'll probably have to hold on to the > entirety of both inputs until you've written the entirety of the > output. > > On the other hand, suppose you have all the blocks in one big file. > The first input run is in blocks 1-100; the second is in blocks > 101-200. You can read blocks 1 and 101, say, and write the results to > blocks 201 and 202, using extra storage, but only a little bit. When > you then read blocks 2 and 102, you write the results to blocks 1 and > 100, which are no longer needed, because you've already merged them. > When you get done with that, blocks 2 and 102 are now no longer needed > and can be used to write the next part of the output. Of course, you > have to keep track of which order to reread the blocks in when the > sort is done: 201, 202, 1, 101, ... but that's a manageable problem. > > -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 20 Jun 2010 17:20 On Sat, Jun 19, 2010 at 4:57 AM, mac_man2005(a)hotmail.it <mac_man2005(a)hotmail.it> wrote: > Tom, Robert, > thank you. > > Now it is clearer how space on tapes is recycled. > > I tried to follow Robert's example but storing one tape per separate file. > Read in the first block of each run (hosted by separate tapes and so by > separate files) and output them into extra storage, wherever this extra > storage is. > Again, those first input blocks are now garbage: is it correct? Yes. > In this case, what happens when trying to recycle those garbage blocks by > hosting the result of merging the second block of each run? You just overwrite them with the new data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "mac_man2005 on 20 Jun 2010 20:39 Robert, so in my example: - tapes are stored in different files (one tape per file) - you confirm those first blocks are garbage - you confirm they could be rewritten with new data This means that we can do recycle space using one tape per file. Correct? So, in this case, why do we need to use logical tapesets? In other words, why Tom affirmed it was impossible to recycle space implementing one tape per file? Il 20/06/2010 23:20, Robert Haas ha scritto: > On Sat, Jun 19, 2010 at 4:57 AM, mac_man2005(a)hotmail.it > <mac_man2005(a)hotmail.it> wrote: > >> Tom, Robert, >> thank you. >> >> Now it is clearer how space on tapes is recycled. >> >> I tried to follow Robert's example but storing one tape per separate file. >> Read in the first block of each run (hosted by separate tapes and so by >> separate files) and output them into extra storage, wherever this extra >> storage is. >> Again, those first input blocks are now garbage: is it correct? >> > Yes. > > >> In this case, what happens when trying to recycle those garbage blocks by >> hosting the result of merging the second block of each run? >> > You just overwrite them with the new data. > > -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 20 Jun 2010 22:25 "mac_man2005(a)hotmail.it" <mac_man2005(a)hotmail.it> writes: > Robert, so in my example: > - tapes are stored in different files (one tape per file) > - you confirm those first blocks are garbage > - you confirm they could be rewritten with new data > This means that we can do recycle space using one tape per file. Correct? No. You could do that if the rate at which you need to write data to the file is <= the rate at which you extract it. But for what we are doing, namely merging runs from several tapes into one output run, it's pretty much guaranteed that you need new space faster than you are consuming data from any one input tape. It balances out as long as you keep *all* the tapes in one operating-system file; otherwise not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: [HACKERS] About tapes Next: beta3 & the open items list |