What is Revision Control?
As used in software development, a revision control system is a tool for recording, indexing and manipulating the changes (revisions) made to the source code of a software system.
For example, let's suppose that Alice and Bob will be working on the
program sort
-- perhaps they intend to fix some bugs and add a new
feature for handling very long input lines. They agree that Alice
will fix the bugs and Bob will add the feature.
Cooperating Programmers Need to Combine their Changes
One straw-man idea -- certainly a poor idea -- is that they make a
single copy of sort
, both fire up their editors, and they just make
changes to that single copy.
This idea won't work well for many reasons but some of the most obvious are: what happens if they want to edit the same file, at the same time? What happens if Alice wants to compile and test a bug-fix, but Bob is in the middle of working on the new feature and his changes won't even compile?
Instead, each programmer should work on a separate copy but then they need some way to combine their efforts.
diff
and patch
Do the Job, but Crudely
A second idea -- workable, but with problems of its own -- is that
Alice and Bob will use the programs diff
and patch
to combine their
work. For example, having fixed a bug, Alice will use diff
to extract
a description of the changes she made. She can give the output from
diff
to Bob who can then use patch
to add those changes to his copy of
the tree.
Cooperating on a program by exchanging diff
output and using patch
can
get the job done. Many (especially small) projects have worked this
way. Even today, many projects that internally use revision control
still use diff
and patch
when cooperating with "external
contributors".
Still, this approach has its limitations and drawbacks. An example
of a limitation is the problem of renaming files. Suppose that
Alice, while fixing a bug, had decided to reorganize the source a
little bit -- to make it easier to maintain. Perhaps she has created
some new subdirectories and moved files around among them. diff
and patch
will not cope with this situation gracefully. Alice can
not easily (or necessarily at all) generate diff
output that Bob
will find useful. An unfortunate consequence is that Alice is
therefore likely to not reorganize the source in the first place --
whatever is gained in future ease of maintenance is hard to justify
given the immediate difficulties it will create working with Bob.
An example of a drawback of the diff/patch
approach is the
cherry-picking problem. Let's suppose that Alice has prepared 10
bug fixes. Of these, 5 are known to be very good, but 5 haven't yet
been tested and there is a suspician that they create new
problems. Meanwhile, Bob would like to test his new feature code in
the context of the known-good bug-fixes from Alice. He wants five of
her changes, but not the other five.
For Bob to solve this problem, he'll need to ask Alice to produce
diff
output for just the 5 known-good fixes and nothing else. To be
able to do that easily, Alice will have to have done a lot of
bookkeeping in her work -- to have kept around enough information to
go back and extract just the known-good changes. And Bob will have
to do a lot of bookkeeping too -- to keep track of which of Alice's
changes he's taken and which not. With just two programmers, all of
this bookkeeping is bad enough. With more programmers, and with more
instances of this problem than just Bob's immediate need, the amount
of bookkeeping quickly becomes impractical.
As a result of drawbacks like that, projects that use only the
diff/patch
approach tend to be either very small, or very out of
control. When out of control, they have difficulty quickly and
reliably producing versions for release, isolating regressions, and
so forth.
So, Alice and Bob need something similar to the diff/patch
approach
-- but that doesn't require quite so much bookkeeping and that doesn't
discourage making improvements such as reorganizing source files.
The Solution: Modern Revision Control
Modern revision control systems could be fairly described as
improving and automating the diff/patch
approach to programmer
cooperation. Roughly speaking, modern revision control systems:
1. Record and Catalog diff
-like Changes: Whenever Alice or Bob does
a "unit of work" -- such as make incremental progress fixing a
bug -- they can ask the revision control system to commit that
work. The commit process involves creating a diff
-like
description of the new changes, attaching them to a
programmer-supplied description of the nature and purpose of the
changes, assigning a standard-form name to the changes (similar
to the call number on a library book), and archiving the changes
where they can be retrieved later. Query tools allow programmers
or managers to view the changes that have been recorded and the
descriptions provided of them.
2. Play-back and Combine Changes in Flexible Ways: Given an
organized record of changes, problems such as Bob's cherry-picking
need become easier to solve. Modern revision control tools give users
flexible mechanisms that can be used for tasks such as "Create a tree
that combines Bob's latest feature work with Alice's bug-fixes 1, 3,
5, 6, and 8." Recording Alice's and Bob's changes separately in the
first place is called branching and combining them is called
merging. It's the same problem that can be solved using just diff
and
patch
-- but modern revision control systems contain features that
automate this rather complex and error-prone process.
Postscript: "Hey, that's not CVS!"
To people accustomed with older revision control systems, especially
something like CVS, this definition of modern revision control may
seem unfamiliar and a bit odd. Many people think of revision control
as a tool that "gives access to historic version" and as a fancy
kind of database where programmers checkin
changes that are
combined with the changes of others. Indeed, most older systems (and
some contemporary ones) make it very difficult, at best, catalog
individual changes (such as one of Alice's bug-fixes) or to combine
changes in flexible ways (such as to solve Bob's cherry-picking
problem).
Modern systems have the basic capabilities of older systems like CVS, but much more beyond those. Whereas CVS makes branching and merging difficult, modern systems make it a natural way to work. Whereas older systems do not help much with cataloging logical changes that may effect multiple files, modern systems use that capability as a foundation. In essense, the familliar capabilities of older systems ("checkin" and "access to historic versions") are just special cases of the basic functionality of modern systems ("recording and cataloging changes" and "playing-back and combining changes in flexible ways").