Best practices

Introduction

This chapter is intended to review various scenarios and describe in each case effective ways of using darcs. There is no one ``best practice'', and darcs is a sufficiently low-level tool that there are many high-level ways one can use it, which can be confusing to new users. The plan (and hope) is that various users will contribute here describing how they use darcs in different environments. However, this is not a wiki, and contributions will be edited and reviewed for consistency and wisdom.

Creating patches

This section will lay down the concepts around patch creation. The aim is to develop a way of thinking that corresponds well to how darcs is behaving -- even in complicated situations.

In a single darcs repository you can think of two ``versions'' of the source tree. They are called the working and pristine trees. Working is your normal source tree, with or without darcs alongside. The only thing that makes it part of a darcs repository is the _darcs directory in its root. Pristine is the recorded state of the source tree. The pristine tree is constructed from groups of changes, called patches (some other version control systems use the term changeset instead of patch).^5.1 Darcs will create and store these patches based on the changes you make in working.

Changes

If working and pristine are the same, there are ``no changes'' in the repository. Changes can be introduced (or removed) by editing the files in working. They can also be caused by darcs commands, which can modify both working and pristine. It is important to understand for each darcs command how it modifies working, pristine or both of them.

whatsnew (as well as diff) can show the difference between working and pristine to you. It will be shown as a difference in working. In advanced cases it need not be working that has changed; it can just as well have been pristine, or both. The important thing is the difference and what darcs can do with it.

Keeping or discarding changes

If you have a difference in working, you do two things with it: record it to keep it, or revert it to lose the changes.^5.2

If you have a difference between working and pristine--for example after editing some files in working--whatsnew will show some ``unrecorded changes''. To save these changes, use record. It will create a new patch in pristine with the same changes, so working and pristine are no longer different. To instead undo the changes in working, use revert. It will modify the files in working to be the same as in pristine (where the changes do not exist).

Unrecording changes

unrecord is a command meant to be run only in private repositories. Its intended purpose is to allow developers the flexibility to undo patches that haven't been distributed yet.

However, darcs does not prevent you from unrecording a patch that has been copied to another repository. Be aware of this danger!

If you unrecord a patch, that patch will be deleted from pristine. This will cause working to be different from pristine, and whatsnew to report unrecorded changes. The difference will be the same as just before that patch was recorded. Think about it. record examines what's different with working and constructs a patch with the same changes in pristine so they are no longer different. unrecord deletes this patch; the changes in pristine disappear and the difference is back.

If the recorded changes included an error, the resulting flawed patch can be unrecorded. When the changes have been fixed, they can be recorded again as a new--hopefully flawless--patch.

If the whole change was wrong it can be discarded from working too, with revert. revert will update working to the state of pristine, in which the changes do no longer exist after the patch was deleted.

Keep in mind that the patches are your history, so deleting them with unrecord makes it impossible to track what changes you really made. Redoing the patches is how you ``cover the tracks''. On the other hand, it can be a very convenient way to manage and organize changes while you try them out in your private repository. When all is ready for shipping, the changes can be reorganized in what seems as useful and impressive patches. Use it with care.

All patches are global, so don't ever replace an already ``shipped'' patch in this way! If an erroneous patch is deleted and replaced with a better one, you have to replace it in all repositories that have a copy of it. This may not be feasible, unless it's all private repositories. If other developers have already made patches or tags in their repositories that depend on the old patch, things will get complicated.

Special patches and pending

The patches described in the previous sections have mostly been hunks. A hunk is one of darcs' primitive patch types, and it is used to remove old lines and/or insert new lines. There are other types of primitive patches, such as adddir and addfile which add new directories and files, and replace which does a search-and-replace on tokens in files.

Hunks are always calculated in place with a diff algorithm just before whatsnew or record. But other types of primitive patches need to be explicitly created with a darcs command. They are kept in pending^5.3until they are either recorded or reverted.

Pending can be thought of as a special extension of working. When you issue, e.g., a darcs replace command, the replace is performed on the files in working and at the same time a replace patch is put in pending. Patches in pending describe special changes made in working. The diff algorithm will fictively apply these changes to pristine before it compares it to working, so all lines in working that are changed by a replace command will also be changed in pendingpristine when the hunks are calculated. That's why no hunks with the replaced lines will be shown by whatsnew; it only shows the replace patch in pending responsible for the change.

If a special patch is recorded, it will simply be moved to pristine. If it is instead reverted, it will be deleted from pending and the accompanying change will be removed from working.

Note that reverting a patch in pending is not the same as simply removing it from pending. It actually applies the inverse of the change to working. Most notable is that reverting an addfile patch will delete the file in working (the inverse of adding it). So if you add the wrong file to darcs by mistake, don't revert the addfile. Instead use remove, which cancels out the addfile in pending.

Using patches

This section will lay down the concepts around patch distribution and branches. The aim is to develop a way of thinking that corresponds well to how darcs is behaving -- even in complicated situations.

A repository is a collection of patches. Patches have no defined order, but patches can have dependencies on other patches. Patches can be added to a repository in any order as long as all patches depended upon are there. Patches can be removed from a repository in any order, as long as no remaining patches depend on them.

Repositories can be cloned to create branches. Patches created in different branches may conflict. A conflict is a valid state of a repository. A conflict makes the working tree ambiguous until the conflict is resolved.

Dependencies

There are two kinds of dependencies: implicit dependencies and explicit dependencies.

Implicit dependencies is the far most common kind. These are calculated automatically by darcs. If a patch removes a file or a line of code, it will have to depend on the patch that added that file or line of code.^5.4If a patch adds a line of code, it will usually have to depend on the patch or patches that added the adjacent lines.

Explicit dependencies can be created if you give the --ask-deps option to darcs record. This is good for assuring that logical dependencies hold between patches. It can also be used to group patches--a patch with explicit dependencies doesn't need to change anything--and pulling the patch also pulls all patches it was made to depend on.

Branches: just normal repositories

Darcs does not have branches--it doesn't need to. Every repository can be used as a branch. This means that any two repositories are ``branches'' in darcs, but it is not of much use unless they have a large portion of patches in common. If they are different projects they will have nothing in common, but darcs may still very well be able to merge them, although the result probably is nonsense. Therefore the word ``branch'' isn't a technical term in darcs; it's just the way we think of one repository in relation to another.

Branches are very useful in darcs. They are in fact necessary if you want to do more than only simple work. When you get someone's repository from the Internet, you are actually creating a branch of it. It may first seem inefficient (or if you come from CVS--frightening), not to say plain awkward. But darcs is designed this way, and it has means to make it efficient. The answer to many questions about how to do a thing with darcs is: ``use a branch''. It is a simple and elegant solution with great power and flexibility, which contributes to darcs' uncomplicated user interface.

You create new branches (i.e., clone repositories) with the get and put commands.

Moving patches around--no versions

Patches are global, and a copy of a patch either is or is not present in a branch. This way you can rig a branch almost any way you like, as long as dependencies are fulfilled--darcs won't let you break dependencies. If you suspect a certain feature from some time ago introduced a bug, you can remove the patch/patches that adds the feature, and try without it.^5.5

Patches are added to a repository with pull and removed from the repositories with unpull. Don't confuse these two commands with record and unrecord, which constructs and deconstructs patches.

It is important not to lose patches when (re)moving them around. pull needs a source repository to copy the patch from, whereas unpull just erases the patch. Beware that if you unpull all copies of a patch it is completely lost--forever. Therefore you should work with branches when you unpull patches. The unpull command can wisely be disabled in a dedicated main repository by adding unpull disable to the repository's defaults file.

For convenience, there is a push command. It works like pull but in the other direction. It also differs from pull in an important way: it starts a second instance of darcs to apply the patch in the target repository, even if it's on the same computer. It can cause surprises if you have a ``wrong'' darcs in your PATH.

Tags--versions

While pull and unpull can be used to construct different ``versions'' in a repository, it is often desirable to name specific configurations of patches so they can be identified and retrieved easily later. This is how darcs implements what is usually known as versions. The command for this is tag, and it records a tag in the current repository.

A tag is just a patch, but it only contains explicit dependencies. It will depend on all the patches in the current repository.^5.6Darcs can recognize if a patch is as a tag; tags are sometimes treated specially by darcs commands.

While traditional revision control systems tag versions in the time line history, darcs lets you tag any configuration of patches at any time, and pass the tags around between branches.

With the option --tag to get you can easily get a named version in the repository as a new branch.

Conflicts

This part of darcs becomes a bit complicated, and the description given here is slightly simplified.

Conflicting patches are created when you record changes to the same line in two different repositories. Same line does not mean the same line number and file name, but the same line added by a common depended-upon patch.

Contrary to many other merging tools, darcs considers two patches making the same change to be a conflict. In fact, darcs doesn't even look at the contents of the conflicting lines. If you think this is wrong, think about two different patches each adding a new keyword and also changing the line ``#define NUM_OF_KEYWORDS 17'' to ``#define NUM_OF_KEYWORDS 18''.

A conflict happens when two conflicting patches meet in the same repository. This is no problem for darcs; it can happily pull together just any patches. But it is a problem for the files in working (and pristine). The conflict can be thought of as two patches telling darcs different things about what a file should look like.

Darcs escapes this problem by ignoring those parts^5.7of the patches that conflict. They are ignored in both patches. If patch A changes the line ``FIXME'' to ``FIXED'', and patch B changes the same line to ``DONE'', the two patches together will produce the line ``FIXME''. Darcs doesn't care which one you pulled into the repository first, you still get the same result when the conflicting patches meet. All other changes made by A and B are performed as normal.

Darcs can mark a conflict for you in working. This is done with resolve (which isn't a very good name). Conflicts are marked such that both conflicting changes are inserted with special delimiter lines around them. Then you can merge the two changes by hand, and remove the delimiters.

When you pull patches, darcs automatically performs a resolve for you if a conflict happens. You can remove the markup with revert, Remember that the result will be the lines from the previous version common to both conflicting patches. The conflict marking can be redone again with resolve.

A special case is when a pulled patch conflicts with unrecorded changes in the repository. The conflict will be automatically marked as usual, but since the markup is also an unrecorded change, it will get mixed in with your unrecorded changes. There is no guarantee you can revert only the markup after this, and resolve will not be able to redo this markup later if you remove it. It is good practice to record important changes before pulling.

resolve can't mark complicated conflicts. In that case you'll have to use darcs diff and other commands to understand what the conflict is all about. If for example two conflicting patches create the same file, resolve will pick just one of them, and no delimiters are inserted. So watch out if darcs tells you about a conflict.

resolve can also be used to check for unresolved conflicts. If there are none, darcs replies ``No conflicts to resolve''. While pull reports when a conflict happens, unpull and get don't.

Resolving conflicts

A conflict is resolved (not marked, as with the command resolve) as soon as some new patch depends on the conflicting patches. This will usually be the resolve patch you record after manually putting together the pieces from the conflict markup produced by resolve (or pull). But it can just as well be a tag. So don't forget to fix conflicts before you accidently ``resolve'' them by recording other patches.

If the conflict is with one of your not-yet-published patches, you may choose to amend that patch rather than creating a resolve patch.

If you want to back out and wait with the conflict, you can unpull the conflicting patch you just pulled. Before you can do that you have to revert the conflict markups that pull inserted when the conflict happened.

Distributed development with one primary developer

This is how darcs itself is developed. There are many contributors to darcs, but every contribution is reviewed and manually applied by myself. For this sort of a situation, darcs send is ideal, since the barrier for contributions is very low, which helps encourage contributors.

One could simply set the _darcs/prefs/email value to the project mailing list, but I also use darcs send to send my changes to the main server, so instead the email address is set to ``Davids Darcs Repo <droundy@abridgegame.org>''. My .procmailrc file on the server has the following rule:

:0
* ^TODavids Darcs Repo
|(umask 022; darcs apply --reply darcs-devel@abridgegame.org \
             --repodir /path/to/repo --verify /path/to/allowed_keys)

This causes darcs apply to be run on any email sent to ``Davids Darcs Repo''. apply actually applies them only if they are signed by an authorized key. Currently, the only authorized key is mine, but of course this could be extended easily enough.

The central darcs repository contains the following values in its _darcs/prefs/defaults:

apply test
apply verbose
apply happy-forwarding

The first line tells apply to always run the test suite. The test suite is in fact the main reason I use send rather than push, since it allows me to easily continue working (or put my computer to sleep) while the tests are being run on the main server. The second line is just there to improve the email response that I get when a patch has either been applied or failed the tests. The third line makes darcs not complain about unsigned patches, but just to forward them to darcs-devel.

On my development computer, I have in my .muttrc the following alias, which allows me to easily apply patches that I get via email directly to my darcs working directory:

macro pager A "<pipe-entry>(umask 022; darcs apply --no-test -v \
        --repodir ~/darcs)"

Development by a small group of developers in one office

This section describes the development method used for the density functional theory code DFT++, which is available at http://dft.physics.cornell.edu/dft.

We have a number of workstations which all mount the same /home via NFS. We created a special ``dft'' user, with the central repository living in that user's home directory. The ssh public keys of authorized persons are added to the ``dft'' user's .ssh/allowed_keys, and we commit patches to this repository using darcs push. As in Section , we have the central repository set to run the test suite before the push goes through.

Note that no one ever runs as the dft user.

A subtlety that we ran into showed up in the running of the test suite. Since our test suite includes the running of MPI programs, it must be run in a directory that is mounted across our cluster. To achieve this, we set the $DARCS_TMPDIR environment variable to ~/tmp.

Note that even though there are only four active developers at the moment, the distributed nature of darcs still plays a large role. Each developer works on a feature until it is stable, a process that often takes quite a few patches, and only once it is stable does he push to the central repository.

darcs-stable 2007-06-16