Tom Lord's Hackery

Tla 1.3.1 Librification Experiment Progress Report

2005-02-21

This document refers to the tla development branch:

Archive: lord@emf.net--librify-tla-2005

Version: tla--factor-1--1.3.1

(See links on the home page for help finding my archives.)

Context

Last week I reported having just spent a week modifying the libawk part of tla to permit string-sharing between multiple relational and associative table entries. (The bug report describes that work.)

I was surprised at how quickly that work went. I had only meant to spend some time estimating how long the task would take. Working on the estimate, I couldn't resist working on the modification itself, and when the deadline for the estimate arrived I already finished the work I was supposed to estimate! (I provided an estimate of "-3 days" -- undertaking the already-complete task should count as adding three days to our schedule. :-)

That got me thinking: the libawk cleanups were a small part of the long list of changes that would be necessarily to incrementally transform libarch from its state in 1.3 into a more librified and more portable piece of code: friendly to scripting languages, GUIs, alternative front-ends, extension languages, and non-unix platforms.

I have long assumed (as described elsewhere that an incremental librification starting from the 1.3 code base had poor chances for success.

But if the fixes to libawk took "-3 days", perhaps the rest of incremental librification wouldn't be so impractical.

A Two Week Experiment

Today marks the middle of what will be a two week experiment in 1.3 librification.

The form of the experiement is that I am spending these two weeks to librify as much of libarch as I can subject to some constraints:

Results After Week 1

Executive Summary

The experiment tries to answer the yes or no question:

The Executive Question

Is persuing the librification effort for 1.3.1 a practical strategy for persuing the high-level objectives for GNU Arch (such as Windows support, Unicode support, scripting and extension language support, a demonstrably/visibly robust implementation, etc.)?

One week into the experiment I will leave my betting money at 90/10: there is a 90% chance that the answer to the executive question will be a clear "yes" at the end of the two week experiment (which will be 28-Feb-2005).

One modest but interesting demonstration of the benefits of librification might be the improvements to error reporting that it might lead to.

Technical Summary

See also ./tla-fn-anatomy.html.

Last Week

I spent most of the first week laying down a foundation for librification. That included:

Factoring the source tree. I set up a framework for splitting up files into multiple directories organized around modular and "modular cluster" boundaries. The old contents of libarch now live in a directory called libarch-compat. Those files will be incrementally deprecated: removed one by one and replaced by librified replacements in other libarch-* directories.

Setting up a new front end. My plan is to librify "from the top down" as much as possible. I'll work in a loop: pick a an arch subcommand; rewrite it to use only librified code (no code from libarch-compat); repeat until there are no unlibrified commands. Therefore, I set up a new tla.c (the home of main). The new front end first looks for a librified version of the subcommand. If it doesn't find one, it runs the subcommand from libarch-compat.

Designing and implementing the error signalling mechanism. libarch needs to consistently and robustly signal errors rather than (in the manner of 1.3 and prior) often simply exitting on discovery of an error condition. Part of what I did this week was to install run-time systemf support (./src/tla/libach-errors) for error management.

Rebuilt libawk. The libawk cleanup modified all callers into libawk to be robust in the face of a libawk implementation that shared strings between multiple table entries. It also modified the existing libawk code to actually share strings opportunistically, resulting in at least a significant run-time space savings. Many librified functions will need libawk-style functionality but the existing libawk implementation does not provide for error signalling and recovery and, in other ways, does not conform to the requirements for a fully librified libarch. Last week, collecting ideas and code-scraps from both the existing libawk and the code base for tla 2.0, I built a new implementation of the functionality in libawk. The new libawk (now called libarch-values), in addition to be librified, adds support for table entries whose values are of types other than just string (e.g., integer-valued table entries).

I also started on librifying the my-id command. That invovled working on revised support for option parsing, on the API for functions implementing tla sub-commands, and work on writing librified versions of the low-level functions for manipulating a user id.

This work went well in a few senses. I was able to cut-past-edit a certain amount of code from both tla 1.3 and tla 2.0 to write what I needed in this context. A great deal of the new code I simply rewrote, from scratch: this was code that is a minor variation on code that I've rewritten from scratch 3 or 4 times over the past few months. The resulting code seems to work well, although testing has been scattershot. I'm satisfied with the emerging calling conventions.

Next Week and Possibly Beyond

In week two I have a little more work to do on the foundation: string primitive operations; better option parsing; the beginnings of a more portable file system protocol stack.

Beyond that I want to librify as much as I can in the remaining time.

I'll consider the experiment to have produced a distinctly positive result (meaning that this approach to librification is worth persuing) if I can get through librifying the file-id command and some commands that pertain to per-user (~/.arch-params) parameters. Such an outcome implies an efficient framework for reimplementing CLI parsers, progress on librifying namespace management, project tree file system access, project tree arch control file access, and ~/.arch-params access.

A positive outcome will warrant a follow-on series of three "wind sprints": one each to librify inventory, mkpatch, and dopatch. Past experience has shown that, once those commands are in place, implementing (in this case, librifying) the rest of tla is a relative cake walk.

Librification Experiment Constraints

This experiment asks how long it will take to make a "clean up pass" over libarch such that, at the end of the process, the constraints described below are satisfied throughout the implementation of tla.

Librification Experiment Constraints

Upward compatability -- for several roughly one week intervals it is anticipated that only part of libarch will be librified. Nevertheless, tla must be fully operable at those intervals, passing both make test and changeset burn-in tests. The intent is that it should be possible (and ideally useful) to merge partially-complete librification work into the mainline early and often.

Perfect Error Handling -- Librified parts of libarch must have perfected error handling. That means that they do not exit the process except under truly uncontinuable conditions -- most errors are signalled to callers. Resource allocation and deallocation must be robustly handled across all paths, including error-signaling paths through the code.

Abstract String Handling -- No part libarch code should make presumptions about the internal representation of strings. Strings should be manipulated purely via procedural interfaces based on an ontology of code-point-index-addressable sequences of unicode codepoints. Where specific codepoint values must be presumed, only graphical and space ASCII characters should be referred to.

Reinforced On-disk Representation Abstractions libarch has long internally had a rough layering of its filesystem access. The vu layer, from hackerlab, provides a low-level indirection above Posix system calls; for each of project-trees, ~/.arch-param directories, and file-system archives arch includes a roughly procedural interface. Within those three primary disk formats are ad-hoc formats for specific subcomponents (e.g, for files in ~/.arch-params or for patch logs in ./{arch}). Two of these subsystems (project tree and archive formats) have proven to need major restructuring for a clean port to Windows-based platforms. Throughout the code, abstraction barriers are unevenly preserved with leaks across them exposing details of path names, descriptors, and so forth. A librified libarch needs to clarify the layering in these components and ensure that the API to them is sufficiently abstract that changes to them (such as for a Windows port) can be made easily.

Customizability, Extensibility, and Self-Documentation Third party developers have made very clear the demand for robust scripting language bindings to libarch. Work on arch GUIs, IDE bindings, and alternative front-ends suggests a similar demand. Some desirable capabilities in the core of arch, such as file-type-specific diff compuation and patch application, suggests a demand for an arch which is not merely scriptable (callable as primitive routines from a scripting language) but extensible (can be configured to call out to extension language routines during core operations). The APIs, data types, error handling conventions, and available documentation used in a librified libarch must be scripting and extension language friendly.

Copyright

Copyright (C) 2004 Tom Lord

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

See the file COPYING for further information about the copyright and warranty status of this work.