From: "Kevin Grittner" on 19 Jan 2010 11:03 Magnus Hagander <magnus(a)hagander.net> wrote: > Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote: >> Magnus Hagander wrote: >> >>>>>> the Git repository is missing parts of two non-recent >>>>>> commits. >> >>> We've seen this happen before. >> >> That seems like kind of a blas� attitude toward something upon >> which some people rely. > > For the record, I am one of those people. I use it for *all* my > postgresql development. And this is a serious pain. It appears I took your comment the wrong way. Apologies. >> When we (at Wisconsin State Courts) were using CVS and had >> scripts to automatically merge changes from one branch to >> another, we saw this sort of thing unless people were very >> careful to grab a timestamp in the past for their ranges and use >> it throughout the script. Perhaps the script is just not careful >> enough? (Said in total ignorance of what the PostgreSQL process >> here actually is....) > > That would be one way. However, AFAIK the tool we use (fromcvs) > doesn't support this. If somebody were to extend the tool with > that, it would be much appreciated. It's a Ruby tool though, so > there's not a thing I can do about it myself... And it's basically > undocumented. > > But yes, if we do that and set the timestamp far enough back in > time, that should make it "reasonably safe". Given how long some > operations can take ((C) year change, release tagging IIRC, stuff > like that), this has to be a fairly large number, which means the > git mirror will lack even further behind. But if that's what we > have to pay to make it safe, I guess we should... The time would > have to be long enough to cover any cvs commit including potential > network slowness during it etc. My Ruby skills are minimal, but we've got some Ruby gurus around here -- maybe between my rough skills and a few impositions on the others I could wrangle something. Is there any particular version I should be looking at? The last official version I can find is 0.0.0.132 from May 3, 2009. Although, if there's not some reasonably obvious fix (like subtracting some fixed amount of time from a timestamp they're already grabbing), perhaps we should just plan on limping along until we can convert to git. Oh, and what sort of delay do you feel would be "long enough to cover any cvs commit including potential network slowness during it etc."? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 19 Jan 2010 11:47 "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: > Oh, and what sort of delay do you feel would be "long enough to > cover any cvs commit including potential network slowness during it > etc."? Why should the script make any assumptions about delay at all? It seems to me that the problem comes from failing to check for changed files, no more and no less. It would be much less of an issue if a non-atomic CVS commit showed up as two separate GIT commits with similar log messages. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 19 Jan 2010 12:02 Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: >> Oh, and what sort of delay do you feel would be "long enough to >> cover any cvs commit including potential network slowness during >> it etc."? > > Why should the script make any assumptions about delay at all? > It seems to me that the problem comes from failing to check for > changed files, no more and no less. It would be much less of an > issue if a non-atomic CVS commit showed up as two separate GIT > commits with similar log messages. I was trying to be accommodating; if Magnus's take on this isn't a consensus, I'll put forward in a little more detail what I had in mind. What we did with our scripts was to grab the current time *from the CVS server* (since not all clocks are necessarily set accurately) and using that as the end of a time range. The end of the previous time range was recorded on successful completion; we would us that as the start of a time range. Done carefully, that allows no commits to be missed. The only way something could be done twice would be for the process to die after it had pushed through some changes and before it reached completion and saved the time. Now, I haven't looked at the fromcvs code yet to know how easy or hard it would be to use this logic within that package, so this is still pretty hand-wavy. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Aidan Van Dyk on 19 Jan 2010 12:06 * Tom Lane <tgl(a)sss.pgh.pa.us> [100119 11:47]: > "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: > > Oh, and what sort of delay do you feel would be "long enough to > > cover any cvs commit including potential network slowness during it > > etc."? > > Why should the script make any assumptions about delay at all? > It seems to me that the problem comes from failing to check for > changed files, no more and no less. It would be much less of an > issue if a non-atomic CVS commit showed up as two separate GIT > commits with similar log messages. Well, I guess you could say: "fromcvs should go back and recheck all the previous work it's done, and double check and make sure no new files have changed for the timestamp/log message pair it's already done, because CVS isn't atomic" But, I think that path leads to craziness... I mean, how far back? CVS is "non-attomic" enough that 2 (well, $N) people can commit separate stuff, all with overlapping time stamps, and they can even commit stuff in the "past" of they really want... But, all I have to say is it's not perfect, pretty good, just deal with the things as they come, after all, it's "CVS" ;-) If you want better than "pretty good", drop CVS, do a one-time conversion (a la parsecvs/cvs2git) and get on with life... As long as CVS is the tool of choice, pretty good is really good... -- Aidan Van Dyk Create like a god, aidan(a)highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
From: "Kevin Grittner" on 19 Jan 2010 13:28
"Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> wrote: > I haven't looked at the fromcvs code yet to know how easy or > hard it would be to use this logic within that package Well, now I have looked. It's about 2,000 lines of pretty dense Ruby code (not as many comments as one would hope, especially since there appears to be *no* other documentation of any sort). On a quick scan, they seem to be *trying* to do what I suggested, which means that some sort of fix could probably be worked out, but that the issue could be subtle enough that it could be hard to find. Perhaps it is as simple, though, as using the client's time instead of the CVS server's time -- that's one of the things I've seen cause problems for this sort of thing using CVS before. I haven't spotted where they're getting the time. Is there anyone fluent in Ruby who wants to look at this and see how they're getting it? http://ww2.fs.ei.tum.de/~corecode/hg/fromcvs/log/132 By the way, is anyone working on fixing up the current problem? I've been talking about trying to prevent recurrences, but that's not gonna help get the current problem solved.... -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |