Prev: Review: Row-level Locks & SERIALIZABLE transactions,postgres vs. Oracle
Next: crash-recovery replay of CREATE TABLESPACE is brokenin HEAD
From: Bruce Momjian on 18 Jul 2010 01:22 Tom Lane wrote: > I managed to crash the executor in the tablespace.sql test while working > on a 9.1 patch, and discovered that the postmaster fails to recover > from that. The end of postmaster.log looks like > > LOG: all server processes terminated; reinitializing > LOG: database system was interrupted; last known up at 2010-07-11 19:30:07 EDT > LOG: database system was not properly shut down; automatic recovery in progress > LOG: consistent recovery state reached at 0/EE26F30 > LOG: redo starts at 0/EE26F30 > FATAL: directory "/home/postgres/pgsql/src/test/regress/testtablespace/PG_9.1_201004261" already in use as a tablespace > CONTEXT: xlog redo create ts: 127158 "/home/postgres/pgsql/src/test/regress/testtablespace" > LOG: startup process (PID 13914) exited with exit code 1 > LOG: aborting startup due to startup process failure > > It looks to me like those well-intentioned recent changes in this area > broke the crash-recovery case. Not good. Sorry for the delay. I didn't realize this was my code that was broken until Heikki told me via IM. The bug is that we can't replay mkdir()/symlink() and assume those will always succeed. I looked at the createdb redo code and it basically drops the directory before creating it. The tablespace directory/symlink setup is more complex, so I just wrote the attached patch to trigger a redo-'delete' tablespace operation before the create tablespace redo operation. Ignoring mkdir/symlink creation failure is not an option because the symlink might point to some wrong location or something. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
From: Bruce Momjian on 18 Jul 2010 11:54 Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes: > > Maybe you should check that it points to the right location? Or drop and > > recreate the symlink, and ignore failure at mkdir. > > More specifically, ignore EEXIST failure when replaying mkdir. Anything > else is still a problem. Thanks for the help. I tried to find somewhere else in our recovery code that was similar but didn't find anything. The attached patch does as suggested. I added the recovery code to the create tablespace function so I didn't have to duplicate all the code that computes the path names. Attached. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
From: Bruce Momjian on 18 Jul 2010 12:05 Bruce Momjian wrote: > Tom Lane wrote: > > Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes: > > > Maybe you should check that it points to the right location? Or drop and > > > recreate the symlink, and ignore failure at mkdir. > > > > More specifically, ignore EEXIST failure when replaying mkdir. Anything > > else is still a problem. > > Thanks for the help. I tried to find somewhere else in our recovery > code that was similar but didn't find anything. > > The attached patch does as suggested. I added the recovery code to the > create tablespace function so I didn't have to duplicate all the code > that computes the path names. > > Attached. Uh, another question. Looking at the createdb recovery, I see: /* * Our theory for replaying a CREATE is to forcibly drop the target * subdirectory if present, then re-copy the source data. This may be * more work than needed, but it is simple to implement. */ if (stat(dst_path, &st) == 0 && S_ISDIR(st.st_mode)) { if (!rmtree(dst_path, true)) ereport(WARNING, (errmsg("some useless files may be left behind in old database directory \"%s\"", dst_path))); } Should I be using rmtree() on the mkdir target? Also, the original tablespace recovery code did not drop the symlink first. I assume that was not a bug only because we don't support moving tablespaces: - /* Create the symlink if not already present */ - linkloc = (char *) palloc(OIDCHARS + OIDCHARS + 1); - sprintf(linkloc, "pg_tblspc/%u", xlrec->ts_id); - - if (symlink(location, linkloc) < 0) - { - if (errno != EEXIST) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not create symbolic link \"%s\": %m", - linkloc))); - } Still, it seems logical to unlink it before creating it. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Bruce Momjian on 19 Jul 2010 01:02 Bruce Momjian wrote: > > The attached patch does as suggested. I added the recovery code to the > > create tablespace function so I didn't have to duplicate all the code > > that computes the path names. > > > > Attached. > > Uh, another question. Looking at the createdb recovery, I see: > > /* > * Our theory for replaying a CREATE is to forcibly drop the target > * subdirectory if present, then re-copy the source data. This may be > * more work than needed, but it is simple to implement. > */ > if (stat(dst_path, &st) == 0 && S_ISDIR(st.st_mode)) > { > if (!rmtree(dst_path, true)) > ereport(WARNING, > (errmsg("some useless files may be left behind in old database directory \"%s\"", > dst_path))); > } > > Should I be using rmtree() on the mkdir target? > > Also, the original tablespace recovery code did not drop the symlink > first. I assume that was not a bug only because we don't support moving > tablespaces: For consistency with CREATE DATABASE recovery and for reliablity, I coded the rmtree() call instead. Patch attached. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
From: Bruce Momjian on 20 Jul 2010 14:21
Bruce Momjian wrote: > Bruce Momjian wrote: > > > The attached patch does as suggested. I added the recovery code to the > > > create tablespace function so I didn't have to duplicate all the code > > > that computes the path names. > > > > > > Attached. > > > > Uh, another question. Looking at the createdb recovery, I see: > > > > /* > > * Our theory for replaying a CREATE is to forcibly drop the target > > * subdirectory if present, then re-copy the source data. This may be > > * more work than needed, but it is simple to implement. > > */ > > if (stat(dst_path, &st) == 0 && S_ISDIR(st.st_mode)) > > { > > if (!rmtree(dst_path, true)) > > ereport(WARNING, > > (errmsg("some useless files may be left behind in old database directory \"%s\"", > > dst_path))); > > } > > > > Should I be using rmtree() on the mkdir target? > > > > Also, the original tablespace recovery code did not drop the symlink > > first. I assume that was not a bug only because we don't support moving > > tablespaces: > > For consistency with CREATE DATABASE recovery and for reliablity, I > coded the rmtree() call instead. Patch attached. Attached patch applied to HEAD and 9.0. 9.0 open item moved to completed. -- Bruce Momjian <bruce(a)momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + |