Prev: FAQ 9.19 How do I return the user's mail address?
Next: FAQ 9.21 How do I use MIME to make an attachment to a mail message?
From: Dilbert on 12 Apr 2010 16:24 I have a question about the Perl Module Algorithm::Diff. I want to use Algorithm::Diff to calculate the diff of the two sequences "ab" and "a1bab", the "LCS" (the "Longest Common Sequence") is a length of 2. Theoretically there are 3 solutions with LCS = 2: Solution 1: ---ab a1bab Solution 2: a-b-- a1bab Solution 3: a---b a1bab I understand that any of those 3 solutions could be returned by Algorithm::Diff, but I would argue that solution 1 is "better" than solution 2 or 3, because solution 1 changes only once between '-' and [ab], whereas solution 2 and 3 change more than once between '-' and [ab]. This is my Perl program: use strict; use warnings; use 5.010; use Algorithm::Diff; use Data::Dumper; my @old = split //, 'ab'; my @new = split //, 'a1bab'; my $d = Algorithm::Diff::sdiff(\@old, \@new); my $line = Dumper($d); $line =~ s{\s}''xmsg; say $line; The output is $VAR1=[['u','a','a'],['+','','1'],['+','','b'],['+','','a'], ['u','b','b']]; This is in fact Solution 3: a---b a1bab How can I teach Algorithm::Diff to choose Solution 1 (the best of the 3 possibilities) ?
From: Kyle T. Jones on 13 Apr 2010 03:01 Dilbert wrote: > I have a question about the Perl Module Algorithm::Diff. > > I want to use Algorithm::Diff to calculate the diff of the two > sequences "ab" and "a1bab", the "LCS" (the "Longest Common Sequence") > is a length of 2. > > Theoretically there are 3 solutions with LCS = 2: > > Solution 1: > ---ab > a1bab > > Solution 2: > a-b-- > a1bab > > Solution 3: > a---b > a1bab > There is only one LCS solution - 'ab'. You're making too much of it and you don't get to say there are 3 solutions with LCS = 2 (as if you were assigning 2, and LCS=3 or LCS=1 were possible with the data you provide), there is just *the* LCS solution, which, in this case, happens to be 'ab'. > I understand that any of those 3 solutions could be returned by > Algorithm::Diff, but I would argue that solution 1 is "better" than > solution 2 or 3, because solution 1 changes only once between '-' and > [ab], whereas solution 2 and 3 change more than once between '-' and > [ab]. > > This is my Perl program: > > use strict; > use warnings; > use 5.010; > use Algorithm::Diff; > use Data::Dumper; > my @old = split //, 'ab'; > my @new = split //, 'a1bab'; > my $d = Algorithm::Diff::sdiff(\@old, \@new); > my $line = Dumper($d); > $line =~ s{\s}''xmsg; > say $line; > > The output is > $VAR1=[['u','a','a'],['+','','1'],['+','','b'],['+','','a'], > ['u','b','b']]; > > This is in fact Solution 3: > a---b > a1bab > No, this is the smallest set of instructions for turning the first sequence into the second sequence (same as diff), with what some find to be an easier to read "side by side" format: (unchanged, was 'a', stays 'a') (add, was nothing(''), adds '1') (add, was nothing(''), adds 'b') (add, was nothing(''), adds 'a') (unchanged, was 'b', stays 'b') The same set of instructions in diff would just be: [ ['+', 1, '1'], ['+', 2, 'b'], ['+', 3, 'a'], ] > How can I teach Algorithm::Diff to choose Solution 1 (the best of the > 3 possibilities) ? That question doesn't make a lot of sense (to me). http://search.cpan.org/dist/Algorithm-Diff/lib/Algorithm/Diff.pm Cheers.
From: Kyle T. Jones on 13 Apr 2010 03:17 Dilbert wrote: I may have misunderstood what you were asking a second ago... let me try again (including diff for each of your three solutions): > I have a question about the Perl Module Algorithm::Diff. > > I want to use Algorithm::Diff to calculate the diff of the two > sequences "ab" and "a1bab", the "LCS" (the "Longest Common Sequence") > is a length of 2. > > Theoretically there are 3 solutions with LCS = 2: > > Solution 1: > ---ab > a1bab > [ ['+', 0, 'a'], ['+', 1, '1'], ['+', 2, 'b'], ] > Solution 2: > a-b-- > a1bab > [ ['+', 1, '1'], ['+', 3, 'a'], ['+', 4, 'b'], ] > Solution 3: > a---b > a1bab > [ ['+', 1, '1'], ['+', 2, 'b'], ['+', 3, 'a'], ] > > How can I teach Algorithm::Diff to choose Solution 1 (the best of the > 3 possibilities) ? Why are any of the three better? Any way you crack this nut, the smallest sequence of operations to turn ab into a1bab involves three adds... so I'm confused as to why you pick one as "best"? Cheers.
From: Dilbert on 13 Apr 2010 05:58 On 13 avr, 09:17, "Kyle T. Jones" <KBf...(a)realdomain.net> wrote: > Dilbert wrote: > > How can I teach Algorithm::Diff to choose Solution 1 (the best of the > > 3 possibilities) ? > > Why are any of the three better? Any way you crack this nut, the > smallest sequence of operations to turn ab into a1bab involves three > adds... so I'm confused as to why you pick one as "best"? Let me try a more realistic example: use strict; use warnings; use 5.010; use Algorithm::Diff; use Data::Dumper; my @old = split //, 'ab'; my @new = split //, 'Show manual contribution: ab :This word can be displayed'; my $d = Algorithm::Diff::sdiff(\@old, \@new); my $line = Dumper($d); $line =~ s{\s}''xmsg; say $line; The output alignment (#al1) is: -a-----------b--------------------------- manual contribution: ab :can be displayed But what I want is (#al2): ---------------------ab------------------ manual contribution: ab :can be displayed Why ? Because, in case I have more than one optimal solution (and this is the case here), I want to consider not only the value of a character ('a' eq 'a' and 'b' eq 'b'), but also the context each character is in. the 'a' in @old has context: {left => 'start', right => 'b'} the 'b' in @old has context: {left => 'a', right => 'end'} the 'a' in @new (#al1) context: {left => 'm', right => 'n'} the 'b' in @new (#al1) context: {left => 'i', right => 'u'} the 'a' in @new (#al2) context: {left => ' ', right => 'b'} the 'b' in @new (#al2) context: {left => 'a', right => ' '} It can easily be seen that @old and @new(#al1) have no context in common Whereas @old and @new(#al2) have two context matches, that is @old('a')->right eq @new(#al2)'a'->right @old('b')->left eq @new(#al2)'a'->left That's why I prefer @new (#al2) over @new (#al1)
From: Dilbert on 13 Apr 2010 06:08
On 13 avr, 11:58, Dilbert <dilbert1...(a)gmail.com> wrote: > my @new = split //, 'Show manual contribution: ab :This word can be > displayed'; That should be... my @new = split //, 'manual contribution: ab :can be displayed'; > Whereas @old and @new(#al2) have two context matches, that is > @old('a')->right eq @new(#al2)'a'->right > @old('b')->left eq @new(#al2)'a'->left in the last line, replace "...@new(#al2)'a'->left..." by "...@new(#al2)'b'->left..." |