From: GZ on 5 Jun 2010 22:35 On Jun 5, 8:42 pm, Ben Finney <ben+pyt...(a)benfinney.id.au> wrote: > GZ <zyzhu2...(a)gmail.com> writes: > > Let me think of a better way to express what I mean by a "smaller > >diff." After Idiffthe two strings, I will have something like this: > > > AAA > > - BBB > > + CCC > > + DDD > > - EEE > > > It means the first line does not change, the second line is replaced > > by the third line, the forth line is new, and the fifth line is > > deleted. > > Are you drawing a distinction between: > > * line FOO is replaced by line BAR > * line FOO is deleted, line BAR is added > > Your wording seems to make that distinction, but I don't see how it's > useful or meaningful in a discussion aboutdiff. Are they not exactly > the same? > > -- > \ Injustice is relatively easy to bear; what stings is justice. | > `\ Henry L. Mencken | > _o__) | > Ben Finney I should distinguish between modifications and additions. In my above example, one line is modified/replaced, one line is added and one line is deleted. There are a total of 3 edits. I am looking for an alternative python library other than difflib that minimizes this number (edit distance).
From: Bryan on 6 Jun 2010 00:16
GZ wrote: > I should distinguish between modifications and additions. In my above > example, one line is modified/replaced, one line is added and one line > is deleted. There are a total of 3 edits. I am looking for an > alternative python library other than difflib that minimizes this > number (edit distance). Since you know the lingo, "edit distance", I expect you've already Googled up the general results. Depending on what kinds of edits we count, the problem of finding a minimal diff can be low-order poly- time, NP-hard, or incomputable. Gonzalo Navarro's 2001 survey, "A Guided Tour to Approximate String Matching", is now available on-line for free: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.7225&rep=rep1&type=pdf The paper notes, "issues have been put aside to keep a reasonable scope," but presents more than most of us ever wanted to know about the subject. I don't know of a module that does what GZ asks. There are scores of line-oriented diff implementations, and there are minimal-edit- distance diffs, but the combination is rare at best. Problem domains that call for a true minimal diff tend not to work in units of lines of text. -- --Bryan |