Tla 2.0 Plans

Updated: 24 Nov 2004

Soon enough, tla-1.3 will be finalized and work will begin on tla-2.0. What is the direction planned for the tla-2.0 series?

menu:

Archive Format Changes
Project Tree and Changeset Format Changes
Hardlinkless Revlibs
Annotate/Blame/File-history Support
Selective Commit
Librification
Higher-Level Commands
Smarter Caching
Windows Support
But How to Do It?
Rewriting Arch

`{Archive Format Changes}`

2.0 wil be a good time to change the archive format number. In other words, all 2.0 clients will be able to read archive from earlier versions of tla, but earlier versions won't be able to read 2.0 archives.

Shallower Paths

I agree with the cries for a (dumb filesystem) archive format with directory layouts like this:


	./category
	 ./category/category--branch1--1.0
	 ./category/category--branch1--1.1
	 ./category/category--branch2--1.1
	   ./category/category--branch2--1.1/X-category--branch2--1.1
	   ./category/category--branch2--1.1/base-0
	   ./category/category--branch2--1.1/patch-1
	   ./category/category--branch2--1.1/...
	 ./category/category--branch3--1.1

        [etc.]

In other words: shorter, shallower paths.

It would be easiest to just say that, in 2.0, archives are free to not both creating empty categories, branches, and versions. 2.0 archives might not support, for example, tla make-branch (or, for that matter, ever require users to run it).

Summary Deltas, Summary Log Contents

I need to drill down through the archived discussions about how we might do this to get the details right but, in general, the 2.0 archive format has to include these.

Sub-version Branches

No, not "svn branches".

People sometimes ask whether a branch-name divides a category or a version. In real life, people want both kinds of branch name, sometimes both in combination.

2.0 should permit sub-version branching:


	gcc--apple-core--8.4--power-pc

Archive Cached Configurations

To get arch, currently, you do something like:

	% tla get $ARCH-TOP-LEVEL-DIR-REVISION  arch
        % cd arch
        % tla buildcfg "./tla.cfg"

I think 2.0 should explore the idea of archive cached configurations.

So, I can say:


	% tla archive-cfg-cache $ARCH-TOP-LEVEL-DIR-REV ./tla.cfg

Later, the entire buldcfg tree can be created by reading it from the tar file now cached in the archive:


	% tla get-cfg $ARCH-TOP-LEVEL-DIR-REV ./tla.cfg arch

But for the question of how they are named, archive cached configurations make a perfectly reasonable format and mechanism for source releases.

Case Insensitive Category and Branch Names

I agree that, within new-format archives, those names can (at least optionally) be case insensitive.

Unicode category and branch Names

(See below, about unicode support in general.)

`{Project Tree and Changeset Format Changes}`

2.0 wil be a good time to change the project tree format number.

Shorter Paths

I agree with the need for shallower, shorter paths to log file.

The Name `{arch}`

I agree that, optionally, {arch} should be renameable to .arch. I wonder, actually, if we can't make this a per-user option with a per-tree default? That is, actually renaming {arch} to .arch or vice versa does not count as a change to the tree. Changing some file within {arch} (or .arch) changes the default name for the tree. User's can set a persistent option to always name the directory the way they prefer, regardless of the default for the tree.

Patch Logs are Sets not Trees

The in-tree patch log should be "pure" -- any file that aren't part of the record of logs should be treated as unrecognized by inventory. mkpatch and dopatch should treat the patch log specially: recording patches to individual log files and set operations on the collection of logs rather than tree delta operations.

In other words, the changeset format should be modified to treat logs specially. The format of in-tree patch log directories should not be hard-coded in changesets.

Re-do `inventory`

The way tagline tags are searched for is broken in ways that can't be fully fixed until the tree format is rev'ed.

The syntax of =tagging-method is sub-optimal.

The actual implementation of inventory is a mess.

It needs to be easier to convert between explicit and tagline tags.

`{Hardlinkless Revlibs}`

AFS and some Windows filesystems lack any useful support for hardlinks.

Arch needs a revision-library-like feature giving the best approximation possible of revlib functionality.

One low-tech approach is to implement revision library revision locking and use only "sliding" revision libraries on systems without hardlinks.

`{Annotate/Blame/File-history Support}`

Arch needs a fast way to show the annotated history of an individual file.

`{Selective Commit}`

commit in arch needs to feel a lot more like commit in CVS.

`{Librification}`

People want that.

`{Higher-Level Commands}`

E.g., the stuff being prototyped in gtla.

`{Smarter Caching}`

`{Windows Support}`

Among the changes above are:

shortening various paths

getting some semblence of revlib support without relying on hardlinks

With those changes in place, a native port of the resulting arch to windows should be trivially simple (e.g. simply linking against a posix compatability library will do most of the job).

`{But How to Do It?}`

Lot's of Small Transformations or a Complete Rewrite?

I estimate that completing the above list of tasks using the technique of making correctness-preserving transformations on the current code base would be equivalent, roughly, to carefully reviewing every line of code in the core at least 5 separate times, rewriting about 30% of the lines on each pass.

(I mostly pulled the numbers out of my ass. My envelope identifies a bit more than 5 tasks there, each of which requires a nearly complete review and each of which I'm guessing will impact about 30% of the code.)

I estimate that completely rewriting arch, from scratch, getting to at least the functionality and reliability of the current code, accomplishing many of the tasks in the 2.0 list --- it's hard to say but it's certainly in the ballpark of making 5 passes over the current code, rewriting 30% each time.

Transformations and a rewrite look, for all the certainty we can guesstimate them at, about equally hard.

Which is More Fun?

Sometimes a program is in a state where it is fun to hack on by making transformations, and other times the program is not in that state. Right now Awiki is in that fun state, for example. It's simple code, not yet too intertwingled. You can add a lot of functionality quickly by making correctness-preserving transformations.

Arch, in the form of tla, is not so clearly in that pleasant state. For example, the assumption that all strings are ascii pervades the code and teasing that apart will be hours and hours of assured tedium mixed with opportunity for serious, subtle error. Is that really the approach to take for Unicode support, for example? Or case-insensitive filename support?

Which Produces Better Results

I think the decision is clinched (at least in my envelope-analysis world) by an "opportunities for error" estimate.

Doing our long task list by correctness-preserving transformations appears to be the same amount of work (as far as we'd care to guess) as doing a complete rewrite of tla.

Same amount of work -- but work that is different in nature.

Transformations have a lot of mechanical steps and steps in which a programmer has to "brain shift". For example, the programmer may have to skim through every function in 5 files, looking for certain coding idioms, and rewriting them when found. There are many opportunities for error: he might skip a file; he might miss an instance of the idiom; he might mistake something else for an instance of the idiom; he might make a typo during one of the rewrites.

With the transformations approach, we accumulate all those risks for every line of core tla code, several separate times for each line.

The complete rewrite approach, on the other hand, tries to work with each resulting line of code at most once or twice (just to write/ peer-review it in the first place). There is less mechanical work and less "brain shifting" between contexts.

Two programmers can spend equal number of hours doing the transformation approach and doing the rewrite approach --- at least we can say confidently that the one using the transformation approach has more opportunities to commit errors.

`{Rewriting Arch}`

I've done this before -- I have some experience in these matters. :-)

I'll write a little plan, next.

Copyright

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

See the file COPYING for further information about the copyright and warranty status of this work.

Tom Lord's Hackery

Tla 2.0 Plans

`{Archive Format Changes}`

Shallower Paths

Summary Deltas, Summary Log Contents

Sub-version Branches

Archive Cached Configurations

Case Insensitive Category and Branch Names

Unicode category and branch Names

`{Project Tree and Changeset Format Changes}`

Shorter Paths

The Name `{arch}`

Patch Logs are Sets not Trees

Re-do `inventory`

`{Hardlinkless Revlibs}`

`{Annotate/Blame/File-history Support}`

`{Selective Commit}`

`{Librification}`

`{Higher-Level Commands}`

`{Smarter Caching}`

`{Windows Support}`

`{But How to Do It?}`

Lot's of Small Transformations or a Complete Rewrite?

Which is More Fun?

Which Produces Better Results

`{Rewriting Arch}`

Copyright

GNU Arch

Tom Lord's Hackery

Tla 2.0 Plans

Shallower Paths

Summary Deltas, Summary Log Contents

Sub-version Branches

Archive Cached Configurations

Case Insensitive Category and Branch Names

Unicode category and branch Names

Shorter Paths

The Name {arch}

Patch Logs are Sets not Trees

Re-do inventory

Lot's of Small Transformations or a Complete Rewrite?

Which is More Fun?

Which Produces Better Results

Copyright

GNU Arch

The Name `{arch}`

Re-do `inventory`