Prev: Query results differ depending on operating system (using GIN)
Next: review: psql: edit function, show function commands patch
From: Marc Cousin on 20 Jul 2010 07:47 Hi, I've been reviewing this patch for the last few days. Here it is : * Submission review * Is the patch in context diff format? Yes * Does it apply cleanly to the current CVS HEAD? Yes * Does it include reasonable tests, necessary doc patches, etc? Doc patches are there. There are no regression tests. Should there be ? * Usability review * Does the patch actually implement that? Yes * Do we want that? I think so. At least I'd like to have this feature :) * Do we already have it? No * Does it follow SQL spec, or the community-agreed behavior? I didn't see a clear conclusion from the -hackers thread on this (GUC vs SQL statement extension) * Does it include pg_dump support (if applicable)? Not applicable. Or should pg_dump and/or pg_restore put lock_timeout to 0 ? * Are there dangers? As it is a guc, it could be set globally, is that a danger ? * Have all the bases been covered? I honestly don't know. It touches alarm signal handling. * Apply the patch, compile it and test: * Does the feature work as advertised? I only tested it with Linux. The code is very OS-dependent. It works as advertised with Linux. I found only one corner case (see below) * Are there corner cases the author has failed to consider? The feature almost works as advertised : it fails when lock_timeout = deadlock_timeout. Then the lock_timeout isn't detected. I think this case isn't considered in handle_sig_alarm(). * Are there any assertion failures or crashes? No * Performance review * Does the patch slow down simple tests? No * If it claims to improve performance, does it? Not applicable * Does it slow down other things? No. Maybe alarm signal handling and enabling will be slower, as there is more work done there (for instance, a GetCurrentTimestamp, that was only done when log_lock_waits was activated until now. But I couldn't measure a slowdown. * Read the changes to the code in detail and consider: * Does it follow the project coding guidelines? I think so * Are there portability issues? It seems to have already been adressed, from the previous discussion in the thread. * Will it work on Windows/BSD etc? It should. I couldn't test it though. Infrastructure is there. * Are the comments sufficient and accurate? Yes * Does it do what it says, correctly? Yes * Does it produce compiler warnings? No * Can you make it crash? No * Consider the changes to the code in the context of the project as a whole: * Is everything done in a way that fits together coherently with other features/modules? I have a feeling that enable_sig_alarm/enable_sig_alarm_for_lock_timeout tries to solve a very specific problem, and it gets complicated because there is no infrastructure in the code to handle several timeouts at the same time with sigalarm, so each timeout has its own dedicated and intertwined code. But I'm still discovering this part of the code. * Are there interdependencies that can cause problems? I don't think so. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: zb on 23 Jul 2010 02:32 Hi, first, thanks for the review. > Hi, I've been reviewing this patch for the last few days. Here it is : > > * Submission review > * Is the patch in context diff format? > Yes > > * Does it apply cleanly to the current CVS HEAD? > Yes > > * Does it include reasonable tests, necessary doc patches, etc? > Doc patches are there. > There are no regression tests. Should there be ? IIRC, there was a discussion/patch about parallel psql that can hold more than one connections open. With that feature, a regression test can be added. Reading the 9.0beta3 docs, it's not there and this patch is not on the current commitfest either. Is there anyone who knows the status of this feature? > * Usability review > * Does the patch actually implement that? > Yes > > * Do we want that? > I think so. At least I'd like to have this feature :) :-) > * Do we already have it? > No > > * Does it follow SQL spec, or the community-agreed behavior? > I didn't see a clear conclusion from the -hackers thread on this (GUC > vs SQL statement extension) > > * Does it include pg_dump support (if applicable)? > Not applicable. Or should pg_dump and/or pg_restore put lock_timeout to 0 > ? > > * Are there dangers? > As it is a guc, it could be set globally, is that a danger ? It could be set globally, but it will exhibit a new global behaviour. Which is not unexpected. The problem may come from [auto]vacuum processes can get stopped. Maybe others, too. A previous version contained a checking function that refused to set it from postgresql.conf for this reason, but it was frowned upon by Tom. :-) The proper fix would be that every such processes should set this GUC to zero for them locally. > * Have all the bases been covered? > I honestly don't know. It touches alarm signal handling. > * Apply the patch, compile it and test: > * Does the feature work as advertised? > I only tested it with Linux. The code is very OS-dependent. It works > as advertised with Linux. I found only one corner case (see below) The setitimer() function is implemented in backend/port/win32/timer.c, so it's abstracted away. With that in mind, I think there's no OS-dependent in this patch. > * Are there corner cases the author has failed to consider? > The feature almost works as advertised : it fails when lock_timeout = > deadlock_timeout. Then the lock_timeout isn't detected. I think this > case isn't considered in handle_sig_alarm(). I will look into this, thanks for spotting it. > * Are there any assertion failures or crashes? > No > > > * Performance review > * Does the patch slow down simple tests? > No > > * If it claims to improve performance, does it? > Not applicable > > * Does it slow down other things? > No. Maybe alarm signal handling and enabling will be slower, as there > is more work done there (for instance, a GetCurrentTimestamp, that > was only done when log_lock_waits was activated until now. But I > couldn't measure a slowdown. > > * Read the changes to the code in detail and consider: > * Does it follow the project coding guidelines? > I think so > > * Are there portability issues? > It seems to have already been adressed, from the previous discussion > in the thread. > > * Will it work on Windows/BSD etc? > It should. I couldn't test it though. Infrastructure is there. > > * Are the comments sufficient and accurate? > Yes > > * Does it do what it says, correctly? > Yes > > * Does it produce compiler warnings? > No > > * Can you make it crash? > No > > * Consider the changes to the code in the context of the project as a > whole: > * Is everything done in a way that fits together coherently with > other features/modules? > I have a feeling that > enable_sig_alarm/enable_sig_alarm_for_lock_timeout tries to solve a > very specific problem, and it gets complicated because there is no > infrastructure in the code to handle several timeouts at the same time > with sigalarm, so each timeout has its own dedicated and intertwined > code. But I'm still discovering this part of the code. There is a problem with setitimer(): only one timer can be alive at one time with the same timer id (ITIMER_REAL). > * Are there interdependencies that can cause problems? > I don't think so. Thanks for the review, I will post a new patch sometime next week. Best regards, Zoltán Böszörményi -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Marc Cousin on 30 Jul 2010 10:35 The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote : > I fixed this by adding CheckLockTimeout() function that works like > CheckStatementTimeout() and ensuring that the same start time is > used for both deadlock_timeout and lock_timeout if both are active. > The preference of errors if their timeout values are equal is: > statement_timeout > lock_timeout > deadlock_timeout As soon as lock_timeout is bigger than deadlock_timeout, it doesn't work, with this new version. Keeping the deadlock_timeout to 1s, when lock_timeout >= 1001, lock_timeout doesn't trigger anymore. > > > * Consider the changes to the code in the context of the project as a whole: > > * Is everything done in a way that fits together coherently with > > other features/modules? > > I have a feeling that > > enable_sig_alarm/enable_sig_alarm_for_lock_timeout tries to solve a > > very specific problem, and it gets complicated because there is no > > infrastructure in the code to handle several timeouts at the same time > > with sigalarm, so each timeout has its own dedicated and intertwined > > code. But I'm still discovering this part of the code. > > > > This WIP patch is also attached for reference, too. I would prefer > this way, but I don't have more time to work on it and there are some > interdependencies in the signal handler when e.g. disable_sig_alarm(true); > means to disable ALL timers not just the statement_timeout. > The specifically coded lock_timeout patch goes with the flow and doesn't > change the semantics and works. If someone wants to pick up the timer > framework patch and can make it work, fine. But I am not explicitely > submitting it for the commitfest. The original patch with the fixes works > and needs only a little more review. Ok, understood. But I like the principle of this framework much more (the rest of the code seems simpler to me as a consequence of this framework). But it goes far beyond the initial intent of the patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alvaro Herrera on 30 Jul 2010 19:06 Excerpts from Boszormenyi Zoltan's message of jue jul 29 07:55:38 -0400 2010: > Hi, > > Marc Cousin írta: > > Hi, I've been reviewing this patch for the last few days. Here it is : > > > ... > > * Are there dangers? > > As it is a guc, it could be set globally, is that a danger ? > > > > I haven't added any new code covering supplemental (e.g. autovacuum) > processes, > the interactions are yet to be discovered. Setting it globally is not > recommended. FWIW there is some code in autovacuum and other auxiliary processes that forcibly sets statement_timeout to 0. I think this patch should do likewise. -- Álvaro Herrera <alvherre(a)commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Boszormenyi Zoltan on 2 Aug 2010 07:25
Marc Cousin írta: > The Thursday 29 July 2010 13:55:38, Boszormenyi Zoltan wrote : > >> I fixed this by adding CheckLockTimeout() function that works like >> CheckStatementTimeout() and ensuring that the same start time is >> used for both deadlock_timeout and lock_timeout if both are active. >> The preference of errors if their timeout values are equal is: >> statement_timeout > lock_timeout > deadlock_timeout >> > > As soon as lock_timeout is bigger than deadlock_timeout, it doesn't > work, with this new version. > > Keeping the deadlock_timeout to 1s, when lock_timeout >= 1001, > lock_timeout doesn't trigger anymore. > I missed one case when the lock_timeout_active should have been set but the timer must not have been re-set, this caused the problem. I blame the hot weather and having no air conditioning. The second is now fixed. :-) I also added one line in autovacuum.c to disable lock_timeout, in case it's globally set in postgresq.conf as per Alvaro's comment. Also, I made sure that only one or two timeout causes (one of deadlock_timeout and lock_timeout in the first case or statement_timeout plus one of the other two) can be active at a time. Previously I was able to trigger a segfault with the default 1sec deadlock_timeout and lock_timeout = 999 or 1001. Effectively, the system's clock resolution makes the lock_timeout and deadlock_timeout equal and RemoveFromWaitQueue() was called twice. This way it's a lot more robust. Best regards, Zoltán Böszörményi |