Subsections

Introduction

Darcs is a revision control system, along the lines of CVS or arch. That means that it keeps track of various revisions and branches of your project, allows for changes to propagate from one branch to another. Darcs is intended to be an ``advanced'' revision control system. Darcs has two particularly distinctive features which differ from other revision control systems: 1) each copy of the source is a fully functional branch, and 2) underlying darcs is a consistent and powerful theory of patches.

Every source tree a branch

The primary simplifying notion of darcs is that every copy of your source code is a full repository. This is dramatically different from CVS, in which the normal usage is for there to be one central repository from which source code will be checked out. It is closer to the notion of arch, since the `normal' use of arch is for each developer to create his own repository. However, darcs makes it even easier, since simply checking out the code is all it takes to create a new repository. This has several advantages, since you can harness the full power of darcs in any scratch copy of your code, without committing your possibly destabilizing changes to a central repository.

Theory of patches

The development of a simplified theory of patches is what originally motivated me to create darcs. This patch formalism means that darcs patches have a set of properties, which make possible manipulations that couldn't be done in other revision control systems. First, every patch is invertible. Secondly, sequential patches (i.e. patches that are created in sequence, one after the other) can be reordered, although this reordering can fail, which means the second patch is dependent on the first. Thirdly, patches which are in parallel (i.e. both patches were created by modifying identical trees) can be merged, and the result of a set of merges is independent of the order in which the merges are performed. This last property is critical to darcs' philosophy, as it means that a particular version of a source tree is fully defined by the list of patches that are in it, i.e. there is no issue regarding the order in which merges are performed. For a more thorough discussion of darcs' theory of patches, see Appendix [*].

A simple advanced tool

Besides being ``advanced'' as discussed above, darcs is actually also quite simple. Versioning tools can be seen as three layers. At the foundation is the ability to manipulate changes. On top of that must be placed some kind of database system to keep track of the changes. Finally, at the very top is some sort of distribution system for getting changes from one place to another.

Really, only the first of these three layers is of particular interest to me, so the other two are done as simply as possible. At the database layer, darcs just has an ordered list of patches along with the patches themselves, each stored as an individual file. Darcs' distribution system is strongly inspired by that of arch. Like arch, darcs uses a dumb server, typically apache or just a local or network file system when pulling patches. darcs has built-in support for using ssh to write to a remote file system. A darcs executable is called on the remote system to apply the patches. Arbitrary other transport protocols are supported, through an environment variable describing a command that will run darcs on the remote system. See the documentation for DARCS_APPLY_FOO in Chapter [*] for details.

The recommended method is to send patches through gpg-signed email messages, which has the advantage of being mostly asynchronous.

Keeping track of changes rather than versions

In the last paragraph, I explained revision control systems in terms of three layers. One can also look at them as having two distinct uses. One is to provide a history of previous versions. The other is to keep track of changes that are made to the repository, and to allow these changes to be merged and moved from one repository to another. These two uses are distinct, and almost orthogonal, in the sense that a tool can support one of the two uses optimally while providing no support for the other. Darcs is not intended to maintain a history of versions, although it is possible to kludge together such a revision history, either by making each new patch depend on all previous patches, or by tagging regularly. In a sense, this is what the tag feature is for, but the intention is that tagging will be used only to mark particularly notable versions (e.g. released versions, or perhaps versions that pass a time consuming test suite).

Other revision control systems are centered upon the job of keeping track of a history of versions, with the ability to merge changes being added as it was seen that this would be desirable. But the fundamental object remained the versions themselves.

In such a system, a patch (I am using patch here to mean an encapsulated set of changes) is uniquely determined by two trees. Merging changes that are in two trees consists of finding a common parent tree, computing the diffs of each tree with their parent, and then cleverly combining those two diffs and applying the combined diff to the parent tree, possibly at some point in the process allowing human intervention, to allow for fixing up problems in the merge such as conflicts.

In the world of darcs, the source tree is not the fundamental object, but rather the patch is the fundamental object. Rather than a patch being defined in terms of the difference between two trees, a tree is defined as the result of applying a given set of patches to an empty tree. Moreover, these patches may be reordered (unless there are dependencies between the patches involved) without changing the tree. As a result, there is no need to find a common parent when performing a merge. Or, if you like, their common parent is defined by the set of common patches, and may not correspond to any version in the version history.

One useful consequence of darcs' patch-oriented philosophy is that since a patch need not be uniquely defined by a pair of trees (old and new), we can have several ways of representing the same change, which differ only in how they commute and what the result of merging them is. Of course, creating such a patch will require some sort of user input. This is a Good Thing, since the user creating the patch should be the one forced to think about what he really wants to change, rather than the users merging the patch. An example of this is the token replace patch (See Section [*]). This feature makes it possible to create a patch, for example, which changes every instance of the variable ``stupidly_named_var'' to ``better_var_name'', while leaving ``other_stupidly_named_var'' untouched. When this patch is merged with any other patch involving the ``stupidly_named_var'', that instance will also be modified to ``better_var_name''. This is in contrast to a more conventional merging method which would not only fail to change new instances of the variable, but would also involve conflicts when merging with any patch that modifies lines containing the variable. By more using additional information about the programmer's intent, darcs is thus able to make the process of changing a variable name the trivial task that it really is, which is really just a trivial search and replace, modulo tokenizing the code appropriately.

The patch formalism discussed in Appendix [*] is what makes darcs' approach possible. In order for a tree to consist of a set of patches, there must be a deterministic merge of any set of patches, regardless of the order in which they must be merged. This requires that one be able to reorder patches. While I don't know that the patches are required to be invertible as well, my implementation certainly requires invertibility. In particular, invertibility is required to make use of Theorem [*], which is used extensively in the manipulation of merges.

Features

Record changes locally

In darcs, the equivalent of a cvs ``commit'' is called record, because it doesn't put the change into any remote or centralized repository. Changes are always recorded locally, meaning no net access is required in order to work on your project and record changes as you make them. Moreover, this means that there is no need for a separate ``disconnected operation'' mode.

Interactive records

You can choose to perform an interactive record, in which case darcs will prompt you for each change you have made and ask if you wish to record it. Of course, you can tell darcs to record all the changes in a given file, or to skip all the changes in a given file, or go back to a previous change, or whatever. There is also an experimental graphical interface, which allows you to view and choose changes even more easily, and in whichever order you like.

Unrecord local changes

As a corollary to the ``local'' nature of the record operation, if a change hasn't yet been published to the world--that is, if the local repository isn't accessible by others--you can safely unrecord a change (even if it wasn't the most recently recorded change) and then re-record it differently, for example if you forgot to add a file, introduced a bug or realized that what you recorded as a single change was really two separate changes.

Interactive everything else

Most darcs commands support an interactive interface. The ``revert'' command, for example, which undoes unrecorded changes has the same interface as record, so you can easily revert just a single change. Pull, push, send and apply all allow you to view and interactively select which changes you wish to pull, push, send or apply.

Test suites

Darcs has support for integrating a test suite with a repository. If you choose to use this, you can define a test command (e.g. ``make check'') and have darcs run that command on a clean copy of the project either prior to recording a change or prior to applying changes--and to reject changes that cause the test to fail.

Any old server

Darcs does not require a specialized server in order to make a repository available for read access. You can use http, ftp, or even just a plain old ssh server to access your darcs repository.

You decide write permissions

Darcs doesn't try to manage write access. That's your business. Supported push methods include direct ssh access (if you're willing to give direct ssh access away), using sudo to allow users who already have shell access to only apply changes to the repository, or verification of gpg-signed changes sent by email against a list of allowed keys. In addition, there is good support for submission of patches by email that are not automatically applied, but can easily be applied with a shell escape from a mail reader (this is how I deal with contributions to darcs).

Symmetric repositories

Every darcs repository is created equal (well, with the exception of a ``partial'' repository, which doesn't contain a full history...), and every working directory has an associated repository. As a result, there is a symmetry between ``uploading'' and ``downloading'' changes--you can use the same commands (push or pull) for either purpose.

CGI script

Darcs has a CGI script that allows browsing of the repositories.

Portable

Darcs runs on UNIX (or UNIX-like) systems (which includes Mac OS X) as well as on Microsoft Windows.

File and directory moves

Renames or moves of files and directories, of course are handled properly, so when you rename a file or move it to a different directory, its history is unbroken, and merges with repositories that don't have the file renamed will work as expected.

Token replace

You can use the ``darcs replace'' command to modify all occurrences of a particular token (defined by a configurable set of characters that are allowed in ``tokens'') in a file. This has the advantage that merges with changes that introduce new copies of the old token will have the effect of changing it to the new token--which comes in handy when changing a variable or function name that is used throughout a project.

Configurable defaults

You can easily configure the default flags passed to any command on either a per-repository or a per-user basis or a combination thereof.

Switching from CVS

Darcs is refreshingly different from CVS.

CVS keeps version controlled data in a central repository, and requires that users check out a working directory whenever they wish to access the version-controlled sources. In order to modify the central repository, a user needs to have write access to the central repository; if he doesn't, CVS merely becomes a tool to get the latest sources.

In darcs there is no distinction between working directories and repositories. In order to work on a project, a user makes a local copy of the repository he wants to work in; he may then harness the full power of version control locally. In order to distribute his changes, a user who has write access can push them to the remote repository; one who doesn't can simply send them by e-mail in a format that makes them easy to apply on the remote system.

Darcs commands for CVS users

Because of the different models used by cvs and darcs, it is difficult to provide a complete equivalence between cvs and darcs. A rough correspondence for the everyday commands follows:
cvs checkout
darcs get
cvs update
darcs pull
cvs -n update
darcs pull --dry-run (summarize remote changes)
cvs -n update
darcs whatsnew --summary (summarize local changes)
cvs -n update | grep '?'
darcs whatsnew -ls | grep ^a (list potential files to add)
rm foo.txt; cvs update foo.txt
darcs revert foo.txt (revert to foo.txt from repo)
cvs diff
darcs whatsnew (if checking local changes)
cvs diff
darcs diff (if checking recorded changes)
cvs commit
darcs record (if committing locally)
cvs commit
darcs tag (if marking a version for later use)
cvs commit
darcs push or darcs send (if committing remotely)
cvs diff | mail
darcs send
cvs add
darcs add
cvs tag -b
darcs get
cvs tag
darcs tag

Migrating CVS repositories to darcs

Tools and instructions for migrating CVS repositories to darcs are provided on the darcs community website: http://darcs.net/DarcsWiki/ConvertingFromCvs

Switching from arch

Although arch, like darcs, is a distributed system, and the two systems have many similarities (both require no special server, for example), their essential organization is very different.

Like CVS, arch keeps data in two types of data structures: repositories (called ``archives'') and working directories. In order to modify a repository, one must first check out a corresponding working directory. This requires that users remember a number of different ways of pushing data around -- tla get, update, commit, archive-mirror and so on.

In darcs, on the other hand, there is no distinction between working directories and repositories, and just checking out your sources creates a local copy of a repository. This allows you to harness the full power of version control in any scratch copy of your sources, and also means that there are just two ways to push data around: darcs record, which stores edits into your local repository, and pull, which moves data between repositories. (darcs push is merely the opposite of pull; send and apply are just the two halves of push).

Darcs commands for arch users

Because of the different models used by arch and darcs, it is difficult to provide a complete equivalence between arch and darcs. A rough correspondence for the everyday commands follows:

tla init-tree
darcs initialize
tla get
darcs get
tla update
darcs pull
tla file-diffs f | patch -R
darcs revert
tla changes -diffs
darcs whatsnew
tla logs
darcs changes
tla file-diffs
darcs diff -u
tla add
darcs add
tla mv
darcs mv (not tla move)
tla commit
darcs record (if committing locally)
tla commit
darcs tag (if marking a version for later use)
tla commit
darcs push or darcs send (if committing remotely)
tla archive-mirror
darcs pull or darcs push
tla tag
darcs get (if creating a branch)
tla tag
darcs tag (if creating a tag).

Migrating arch repositories to darcs

Tools and instructions for migrating arch repositories to darcs are provided on the darcs community website: http://darcs.net/DarcsWiki/ConvertingFromArch

darcs-stable 2007-06-16