package VCS::Lite; use strict; use warnings; our $VERSION = '0.08'; =head1 NAME VCS::Lite - Minimal version control system =head1 SYNOPSIS use VCS::Lite; # diff my $lit = VCS::Lite->new('/home/me/foo1.txt'); my $lit2 = VCS::Lite->new('/home/me/foo2.txt'); my $difftxt = $lit->delta($lit2)->diff; print OUTFILE $difftxt; # patch my $delt = VCS::Lite::Delta->new('/home/me/patch.diff'); my $lit3 = $lit->patch($delt); print OUTFILE $lit3->text; # merge my $lit4 = $lit->merge($lit->delta($lit2),$lit->delta($lit3)); print OUTFILE $lit4->text; =head1 DESCRIPTION This module provides the functions normally associated with a version control system, but without needing or implementing a version control system. Applications include wikis, document management systems and configuration management. It makes use of the module Algorithm::Diff. It provides the facility for basic diffing, patching and merging. =head2 new The underlying storage concept of VCS::Lite is an array. The members of the array can be anything that a scalar can represent (including references to structures and objects). The default is for the object to hold an array of scalars as strings corresponding to lines of text. The basic form of the constructor is as follows: my $lite = VCS::Lite->new( '/my/file'); which slurps the file to make an object. The full form is as follows: my $lite = VCS::Lite->new( $object_id, $separation, $source, ...); =over 4 =item C<$object_id> This is a string to identify what is being diffed, patched or merged, in the application's environment. If there is no $source, this is used as a filename from which to read the content. =item C<$separation> This is an optional parameter, which can be used via $/ to split the input file into tokens. The default is for lines of text. If you pass in a string to be tokenized, this will use $sep as a regular expression $separation can be a scalar or scalar ref, where this is used to break up the input stream. All values permitted for $/ are allowed (see L). $separation can also be a hashref, to give a finer level of control. For example: { in => '\n', out => '\n', chomp => 1 } 'in' is the input record separator to use (the same as you would pass as $sep). Note that all values allowed for $/, and indeed the value of $/ passed in is what is used as a default. 'in' can be a string or a regexp. 'out' is the character used on joining the members to output the results (text method in scalar context). This is the output record separator $\. Note that 'out' defaults differently depening on the setting of 'chomp': if 'chomp' is off, 'out' will default to the empty string, or rather the passed in value of $\. If 'chomp' is on, 'out' will default to 'in' - note that you should specify 'out' explicitly if you are using a regexp for 'in'. If the 'chomp' flag is set, the text matching 'in' is removed from the input lines as they are read. 'chomp' is not on by default, as this is new functionality in release 0.08. =item C<$source> if unspecified causes $object_id to be opened as a file and its entire contents read in. The alternative is to supply $source, which can be one of the following: =over 4 =item C This is a string which is tokenized using $separation =item C Array of tokens =item C or C Contents of file are slurped =item C This is called successively to obtain tokens until received undef. =back =back In the Perl spirit of DWIM, new assumes that given an arrayref, you have already done all the work of making your list of whatevers. Given a string (filename) or a file handle, the file is slurped, reading each line of text into a member of the array. Given a callback, the routine is called successively with arguments $p1, $p2, etc. and is expected to return a scalar which is added (pushed on) to the array. =head2 apply $lite->apply($lite2); $lite->apply($lite3, base => 'original'); This method call corresponds approximately to a version control system's check-in function. This causes $lite to be modified, so that its contents now reflect those of $lite2. $lite does retain the original contents, available via L. However, unlike in a version control system, the object holds only the first original and latest contents. The VCS::Lite object passed in can also have its own original version. If this is the case, merging will be performed to incorporate the change as if it had come from a different branch. To facilitiate the merging process, optionally specify a base version, which can be the string 'original', 'contents' (the default) or a VCS::Lite object whose contents will be used. This corresponds to the "common ancestor" in version control systems. =head2 original This returns a VCS::Lite object for the original version, before changes were applied with apply. =head2 text my $foo = $lite->text; my $bar = $lit2->text('|'); my @baz = $lit3->text; In scalar context, returns the equivalent of the file contents slurped (the optional separation parameter, defaulting to $_, is used to join the strings together). In list context, returns the list of lines or records. =head2 id my $fil = $lite->id Returns the name associated with the VCS::Lite element when it was created by new. This is usually the file name. =head2 delta my $delt = $lit->delta($lit2); Perform the difference between two VCS::Lite objects. This object returns a L object. =head2 diff This is for backward compatibility with early versions. $lite->diff($lite2) is equivalent to $lite->delta($lite2)->diff. =head2 patch my $lit3 = $lit->patch($delt); Applies a patch to a VCS::Lite object. Accepts a file handle or file name string. Reads the file in diff format, and applies it. Returns a VCS::Lite object for the patched source. =head2 merge my $lit4 = $lit->merge($lit1,$lit2,\&confl); Performs the "parallelogram of merging". This applies two different change streams represented by VCS::Lite objects. Returns a VCS::Lite object with both sets of changes merged. The third parameter to the method is a sub which is called whenever a merge conflict occurs. This needs to either resolve the conflict or insert the necessary text to highlight the conflict. =head1 COPYRIGHT Copyright (c) Ivor Williams, 2002-2006 =head1 LICENCE You may use, modify and distribute this module under the same terms as Perl itself. =head1 ACKNOWLEDGEMENTS Colin Robertson for suggesting and providing patches for support of files with unterminated last lines. =head1 SEE ALSO L. =cut use Carp; use Algorithm::Diff qw(traverse_sequences); sub new { my ($class,$id,$sep,$src,@args) = @_; my %proto = (); # Decode $sep as needed if (ref($sep) eq 'HASH') { %proto = %$sep; $sep = $proto{in}; delete $proto{in}; } # DWIM logic, based on $src parameter. # Case 0: $src missing. Use $id as file name, becomes case 3 open $src,$id or croak("failed to open '$id': $!") unless $src; my $atyp = ref $src; $sep ||= $/; local $/ = $sep if $sep; $proto{out} ||= $\ || ''; my $out_sep = $proto{out}; my @contents; # Case 1: $src is string if (!$atyp) { @contents = split /(?=$sep)/,$src; } # Case 2: $src is arrayref elsif ($atyp eq 'ARRAY') { @contents = @$src; } # Case 3: $src is globref (file handle) elsif ($atyp eq 'GLOB') { @contents = <$src>; } # Case 4: $src is coderef - callback elsif ($atyp eq 'CODE') { while (my $item=&$src(@args)) { push @contents,$item; } } # Case otherwise is an error. else { croak "Invalid argument"; } $proto{last_line_short} = 1 if @contents && ($contents[-1] !~ /$sep$/); if ($proto{chomp}) { s/$sep$//s for @contents; $proto{out} ||= $sep; } bless { id => $id, contents => \@contents, separator => $sep, %proto },$class; } sub original { my $self = shift; my $pkg = ref $self; exists($self->{original}) ? bless ({ id => $self->id, contents => $self->{original}, separator => $self->{separator}, out => $self->{out}, chomp => $self->{chomp}, }, $pkg ) : $self; } sub apply { my ($self,$other,%par) = @_; my $pkg = ref $self; my $base = $par{base}; $base ||= 'contents'; $base = $pkg->new( $self->id, $self->{separator}, $self->{$base}) unless ref $base; my $cbase = exists($other->{original}) ? $other->original : $base; my $mrg = $cbase->merge($base,$other); my $mrg2 = $base->merge($self,$mrg); $self->{original} ||= $self->{contents}; $self->{contents} = [$mrg2->text]; } sub text { my ($self,$sep) = @_; $sep ||= $self->{out} || ''; wantarray ? @{$self->{contents}} : join $sep,@{$self->{contents}}; } sub id { my $self = shift; @_ ? ($self->{id} = shift) : $self->{id}; } use VCS::Lite::Delta; sub delta { my $lite1 = shift; my $lite2 = shift; my %par = @_; my @wl1 = $lite1->_window($par{window}); my @wl2 = $lite2->_window($par{window}); my @d = map { [map { [$_->[0] . ($_->[2]{short} ? '/' : ''), $_->[1], $_->[2]{line} ] } @$_ ] } Algorithm::Diff::diff(\@wl1,\@wl2,sub { $_[0]{window}; }) or return undef; VCS::Lite::Delta->new(\@d,$lite1->id,$lite2->id,$lite1->{out}); } sub _window { my $self = shift; my $win = shift || 0; my ($win_from,$win_to) = ref($win) ? (-$win->[0],$win->[1]) : (-$win,$win); my @wintxt; my $max = $#{$self->{contents}}; for (0..$max) { my $win_lb = $_ + $win_from; $win_lb = 0 if $win_lb < 0; my $win_ub = $_ + $win_to; $win_ub = $max if $win_ub > $max; push @wintxt, join $self->{out}, (@{$self->{contents}}[$win_lb .. $win_ub], (($win_ub < $max) || !$self->{last_line_short}) ? '' : ()); } map { {line => $self->{contents}[$_], window => $wintxt[$_], ( $self->{last_line_short} && ($_ == $max)) ? ( short => 1 ) : (), } } (0..$max); } sub diff { my $self = shift; $self->delta(@_)->diff; } sub patch { my $self = shift; my $patch = shift; $patch = VCS::Lite::Delta->new($patch,@_) unless ref $patch eq 'VCS::Lite::Delta'; my @out = @{$self->{contents}}; my $id = $self->id; my $pkg = ref $self; my @pat = $patch->hunks; for (@pat) { for (@$_) { my ($ind,$lin,$txt) = @$_; next unless $ind =~ /^-/; _error($lin,'Patch failed'),return undef if $out[$lin] ne $txt; } } my $line_offset = 0; my $lls = 0; for (@pat) { my @txt1 = grep {$_->[0] =~ /^\-/} @$_; my @txt2 = grep {$_->[0] =~ /^\+/} @$_; my $base_line = @txt2 ? $txt2[0][1] : $txt1[0][1] + $line_offset; splice @out,$base_line,scalar(@txt1),map {$_->[2]} @txt2; $line_offset += @txt2 - @txt1; $lls += grep {$_->[0] eq '+/'} @txt2; } $pkg->new($id,{ in => $self->{separator}, chomp => $self->{chomp}, out => $self->{out}, last_line_short => $lls, },\@out); } # Equality of two array references (contents) sub _equal { my ($a,$b) = @_; return 0 if @$a != @$b; foreach (0..$#$a) { return 0 if $a->[$_] ne $b->[$_]; } 1; } sub merge { my ($self,$d1,$d2) = @_; my $pkg = ref $self; my $orig = [$self->text]; my $chg1 = [$d1->text]; my $chg2 = [$d2->text]; my $out_title = $d1->{id} . '|' . $d2->{id}; my %ins1; my $del1 = ''; traverse_sequences( $self->{contents}, $chg1, { MATCH => sub { $del1 .= ' ' }, DISCARD_A => sub { $del1 .= '-' }, DISCARD_B => sub { push @{$ins1{$_[0]}},$chg1->[$_[1]] }, } ); my %ins2; my $del2 = ''; traverse_sequences( $self->{contents}, $chg2, { MATCH => sub { $del2 .= ' ' }, DISCARD_A => sub { $del2 .= '-' }, DISCARD_B => sub { push @{$ins2{$_[0]}},$chg2->[$_[1]] }, } ); # First pass conflict detection: deletion on file 1 and insertion on file 2 $del1 =~ s(\-+){ my $stlin = length $`; my $numdel = length $&; my @confl = map {exists $ins2{$_} ? ($_) : ()} ($stlin+1..$stlin+$numdel-1); @confl ? '*' x $numdel : $&; }eg; # Now the other way round: deletion on file 2 and insertion on file 1 $del2 =~ s(\-+){ my $stlin = length $`; my $numdel = length $&; my @confl = map {exists $ins1{$_} ? ($_) : ()} ($stlin+1..$stlin+$numdel-1); @confl ? '*' x $numdel : $&; }eg; # Conflict type 1 is insert of 2 into deleted 1, Conflict type 2 is insert of 1 into deleted 2 # @defer is used to hold the 'other half' alternative for the conflict my $conflict = 0; my $conflict_type = 0; my @defer; my @out; for (0..@{$self->{contents}}) { # Get details pertaining to current @f0 input line my $line = $self->{contents}[$_]; my $d1 = substr $del1,$_,1; my $ins1 = $ins1{$_} if exists $ins1{$_}; my $d2 = substr $del2,$_,1; my $ins2 = $ins2{$_} if exists $ins2{$_}; # Insert/insert conflict. This is not a conflict if both inserts are identical. if ($ins1 && $ins2 && !&_equal($ins1,$ins2)) { push @out, ('*'x20)."Start of conflict ".(++$conflict). " Insert to Primary, Insert to Secondary ".('*'x60)."\n"; push @out, @$ins1, ('*'x100)."\n", @$ins2; push @out, ('*'x20)."End of conflict ".$conflict.('*'x80)."\n"; } elsif (!$conflict_type) { #Insert/Delete conflict # Normal insertion - may be from $ins1 or $ins2. Apply the inser and junk both $ins1 and $ins2 $ins1 ||= $ins2; push @out, @$ins1 if defined $ins1; undef $ins1; undef $ins2; } # Detect start of conflict 1 and 2 if (!$conflict_type && $d1 eq '*') { push @out, ('*'x20)."Start of conflict ".(++$conflict). " Delete from Primary, Insert to Secondary ".('*'x60)."\n"; $conflict_type = 1; } if (!$conflict_type && $d2 eq '*') { push @out, ('*'x20)."Start of conflict ".(++$conflict). " Delete from Secondary, Insert to Primary ".('*'x60)."\n"; $conflict_type = 2; } # Handle case where we are in an Insert/Delete conflict block already if ($conflict_type == 1) { if ($d1 eq '*') { # Deletion block continues... push @defer,(@$ins2) if $ins2; push @defer,$line if !$d2; } else { # handle end of block, dump out @defer and clear it push @out, ('*'x100)."\n",@defer; undef @defer; push @out, ('*'x20)."End of conflict ".$conflict.('*'x80)."\n"; $conflict_type = 0; } } if ($conflict_type == 2) { if ($d2 eq '*') { # Deletion block continues... push @defer,(@$ins1) if $ins1; push @defer,$line if !$d1; } else { # handle end of block, dump out @defer and clear it push @out, ('*'x100),"\n", @defer; undef @defer; push @out, ('*'x20)."End of conflict ".$conflict.('*'x80)."\n"; $conflict_type = 0; } } last unless defined $line; # for end of file, don't want to push undef push @out, $line unless ($d1 eq '-' || $d2 eq '-') && !$conflict_type; } $pkg->new($out_title, undef, \@out); } sub _error {}; 1;