From: Fujii Masao on 16 Feb 2010 04:56 On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(a)hagander.net> wrote: > With the libpq fixes, I get further (more on that fix later, btw), but > now I get stuck in this. When I do something on the master that > generates WAL, such as insert a record, and then try to query this on > the slave, the walreceiver process crashes with: > > PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: > Invalid argument > LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 > > I'll keep digging at the details, but if somebody has a good idea here.. ;) Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. Though I've not idenfied the cause yet, I guess that it derives from wrong use of the type of local variables in XLogWalRcvWrite(). I'll continue investigation of it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Magnus Hagander on 16 Feb 2010 05:20 2010/2/16 Fujii Masao <masao.fujii(a)gmail.com>: > On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(a)hagander.net> wrote: >> With the libpq fixes, I get further (more on that fix later, btw), but >> now I get stuck in this. When I do something on the master that >> generates WAL, such as insert a record, and then try to query this on >> the slave, the walreceiver process crashes with: >> >> PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: >> Invalid argument >> LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 >> >> I'll keep digging at the details, but if somebody has a good idea here.. ;) > > Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. > Though I've not idenfied the cause yet, I guess that it derives from wrong use > of the type of local variables in XLogWalRcvWrite(). I'll continue investigation > of it. Thanks! I will be somewhat spottily available over the next two days due to on-site work with clients. Let me know if you would be helped by some details of how to get a (somewhat faster) EC2 image up and running with MSVC to test on :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 17 Feb 2010 00:55 On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander <magnus(a)hagander.net> wrote: > If you send me your amazon id, I can get you premissions on my private > image. I plan to clean it up and make it public, just haven't gotten > around to it yet... Thanks for your concern! I'll send the ID when I complete the preparation. And, fortunately?, when I set wal_sync_method to open_sync, the problem was reproduced in the linux, too. The cause is that the data that is written by walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is used by default. So the problem always happened on win32. I propose two solution ideas: 1. O_DIRECT is somewhat harmful in the standby since the data written by walreceiver is read by the startup process immediately. So, how about not making only walreceiver use O_DIRECT? 2. Straightforwardly observe the alignment rule. Since the received WAL data might start at the middle of WAL block, walreceiver needs to keep the last half-written WAL block for alignment. OTOH since the received data might end at the middle of WAL block, walreceiver needs zero-padding. As a result, walreceiver writes the set of the last WAL block, received data and zero-padding. Which is better? Or do you have another better idea? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Magnus Hagander on 17 Feb 2010 01:03 On Wed, Feb 17, 2010 at 06:55, Fujii Masao <masao.fujii(a)gmail.com> wrote: > On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander <magnus(a)hagander.net> wrote: >> If you send me your amazon id, I can get you premissions on my private >> image. I plan to clean it up and make it public, just haven't gotten >> around to it yet... > > Thanks for your concern! I'll send the ID when I complete the preparation. ok. > And, fortunately?, when I set wal_sync_method to open_sync, the problem was > reproduced in the linux, too. The cause is that the data that is written by Ah, that's good. It always helps if it's a cross-platform issue - particularly in that it's not one of the funky win32 specific things we did :) > walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is > used by default. So the problem always happened on win32. Ahh. I see. > I propose two solution ideas: > > 1. O_DIRECT is somewhat harmful in the standby since the data written by > walreceiver is read by the startup process immediately. So, how about > not making only walreceiver use O_DIRECT? In that case, O_DIRECT would be counterproductive, no? It maps to FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the cache. So the read in the startup proc is actually guaranteed to reuqire a physical read - of something we just wrote, so it'll almost certainly end up waiting for a rotation, no? Seems like getting rid of O_DIRECT here is the right thing to do, regardless of this. > 2. Straightforwardly observe the alignment rule. Since the received WAL > data might start at the middle of WAL block, walreceiver needs to keep > the last half-written WAL block for alignment. OTOH since the received > data might end at the middle of WAL block, walreceiver needs zero-padding. > As a result, walreceiver writes the set of the last WAL block, received > data and zero-padding. May there be other reasons to d this as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 16 Feb 2010 07:40 On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander <magnus(a)hagander.net> wrote: > 2010/2/16 Fujii Masao <masao.fujii(a)gmail.com>: >> On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(a)hagander.net> wrote: >>> With the libpq fixes, I get further (more on that fix later, btw), but >>> now I get stuck in this. When I do something on the master that >>> generates WAL, such as insert a record, and then try to query this on >>> the slave, the walreceiver process crashes with: >>> >>> PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160: >>> Invalid argument >>> LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487 >>> >>> I'll keep digging at the details, but if somebody has a good idea here... ;) >> >> Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too. >> Though I've not idenfied the cause yet, I guess that it derives from wrong use >> of the type of local variables in XLogWalRcvWrite(). I'll continue investigation >> of it. > > Thanks! > > I will be somewhat spottily available over the next two days due to > on-site work with clients. > > Let me know if you would be helped by some details of how to get a > (somewhat faster) EC2 image up and running with MSVC to test on :-) Thanks! I can probably use the EC2 image by reading your great blog post. http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html But it might take some time to make my sysadmin open the port for rdesktop for some reasons... Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: [HACKERS] OpenVMS? Next: auto_explain causes regression failures |