From: "Kevin Grittner" on
Magnus Hagander <magnus(a)hagander.net> wrote:
> Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote:
>> Magnus Hagander wrote:
>>
>>>>>> the Git repository is missing parts of two non-recent
>>>>>> commits.
>>
>>> We've seen this happen before.
>>
>> That seems like kind of a blas� attitude toward something upon
>> which some people rely.
>
> For the record, I am one of those people. I use it for *all* my
> postgresql development. And this is a serious pain.

It appears I took your comment the wrong way. Apologies.

>> When we (at Wisconsin State Courts) were using CVS and had
>> scripts to automatically merge changes from one branch to
>> another, we saw this sort of thing unless people were very
>> careful to grab a timestamp in the past for their ranges and use
>> it throughout the script. Perhaps the script is just not careful
>> enough? (Said in total ignorance of what the PostgreSQL process
>> here actually is....)
>
> That would be one way. However, AFAIK the tool we use (fromcvs)
> doesn't support this. If somebody were to extend the tool with
> that, it would be much appreciated. It's a Ruby tool though, so
> there's not a thing I can do about it myself... And it's basically
> undocumented.
>
> But yes, if we do that and set the timestamp far enough back in
> time, that should make it "reasonably safe". Given how long some
> operations can take ((C) year change, release tagging IIRC, stuff
> like that), this has to be a fairly large number, which means the
> git mirror will lack even further behind. But if that's what we
> have to pay to make it safe, I guess we should... The time would
> have to be long enough to cover any cvs commit including potential
> network slowness during it etc.

My Ruby skills are minimal, but we've got some Ruby gurus around
here -- maybe between my rough skills and a few impositions on the
others I could wrangle something. Is there any particular version I
should be looking at? The last official version I can find is
0.0.0.132 from May 3, 2009.

Although, if there's not some reasonably obvious fix (like
subtracting some fixed amount of time from a timestamp they're
already grabbing), perhaps we should just plan on limping along
until we can convert to git.

Oh, and what sort of delay do you feel would be "long enough to
cover any cvs commit including potential network slowness during it
etc."?

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
"Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes:
> Oh, and what sort of delay do you feel would be "long enough to
> cover any cvs commit including potential network slowness during it
> etc."?

Why should the script make any assumptions about delay at all?
It seems to me that the problem comes from failing to check for
changed files, no more and no less. It would be much less of an
issue if a non-atomic CVS commit showed up as two separate GIT
commits with similar log messages.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes:
>> Oh, and what sort of delay do you feel would be "long enough to
>> cover any cvs commit including potential network slowness during
>> it etc."?
>
> Why should the script make any assumptions about delay at all?
> It seems to me that the problem comes from failing to check for
> changed files, no more and no less. It would be much less of an
> issue if a non-atomic CVS commit showed up as two separate GIT
> commits with similar log messages.

I was trying to be accommodating; if Magnus's take on this isn't a
consensus, I'll put forward in a little more detail what I had in
mind.

What we did with our scripts was to grab the current time *from the
CVS server* (since not all clocks are necessarily set accurately)
and using that as the end of a time range. The end of the previous
time range was recorded on successful completion; we would us that
as the start of a time range. Done carefully, that allows no
commits to be missed. The only way something could be done twice
would be for the process to die after it had pushed through some
changes and before it reached completion and saved the time.

Now, I haven't looked at the fromcvs code yet to know how easy or
hard it would be to use this logic within that package, so this is
still pretty hand-wavy.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Aidan Van Dyk on
* Tom Lane <tgl(a)sss.pgh.pa.us> [100119 11:47]:
> "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes:
> > Oh, and what sort of delay do you feel would be "long enough to
> > cover any cvs commit including potential network slowness during it
> > etc."?
>
> Why should the script make any assumptions about delay at all?
> It seems to me that the problem comes from failing to check for
> changed files, no more and no less. It would be much less of an
> issue if a non-atomic CVS commit showed up as two separate GIT
> commits with similar log messages.

Well, I guess you could say:

"fromcvs should go back and recheck all the previous work it's done,
and double check and make sure no new files have changed for the
timestamp/log message pair it's already done, because CVS isn't atomic"

But, I think that path leads to craziness... I mean, how far back? CVS
is "non-attomic" enough that 2 (well, $N) people can commit separate
stuff, all with overlapping time stamps, and they can even commit stuff
in the "past" of they really want...

But, all I have to say is it's not perfect, pretty good, just deal with the
things as they come, after all, it's "CVS"

;-)

If you want better than "pretty good", drop CVS, do a one-time
conversion (a la parsecvs/cvs2git) and get on with life... As long as
CVS is the tool of choice, pretty good is really good...

--
Aidan Van Dyk Create like a god,
aidan(a)highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
From: "Kevin Grittner" on
"Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> wrote:

> I haven't looked at the fromcvs code yet to know how easy or
> hard it would be to use this logic within that package

Well, now I have looked. It's about 2,000 lines of pretty dense
Ruby code (not as many comments as one would hope, especially since
there appears to be *no* other documentation of any sort). On a
quick scan, they seem to be *trying* to do what I suggested, which
means that some sort of fix could probably be worked out, but that
the issue could be subtle enough that it could be hard to find.

Perhaps it is as simple, though, as using the client's time instead
of the CVS server's time -- that's one of the things I've seen cause
problems for this sort of thing using CVS before. I haven't spotted
where they're getting the time.

Is there anyone fluent in Ruby who wants to look at this and see how
they're getting it?

http://ww2.fs.ei.tum.de/~corecode/hg/fromcvs/log/132

By the way, is anyone working on fixing up the current problem?
I've been talking about trying to prevent recurrences, but that's
not gonna help get the current problem solved....

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers