The LASi Project - Motivation and Strategy

Motivation

LASi was first motivated by Ed Trager's desire to provide Unicode support for scientific symbols and non-Latin scripts within his program Madeline. Madeline is a program for preparing, visualizing, and exploring human pedigree data used in genetic linkage studies. The program uses the Adobe Postscript language to produce human pedigree drawings and LOD (logarithm of odds) plots of linkage analysis results from 3rd-party programs.

In the tradition of many scientific programs, Ed designed Madeline to be independent of any graphical user interface (GUI) or application framework library (such as Qt, GTK+, or the Microsoft Windows API). Instead, the program features a scriptable command-line interface. The lack of a GUI insures that the program has a small memory footprint and is portable across many platforms.

In genetic linkage studies, researchers are often interested in the geographic origins and migrations of study populations. Ed realized that accurate annotation of birth places and other information on pedigree drawings in the research environment could easily require accented Latin characters, or any number of non-Latin scripts that might be required by his international research colleagues who at various times have included Chinese, Indian, Iranian, Japanese, Latin Americans, and other nationalities.

Human Pedigree Drawing from Madeline. Click for enlargement.

The program Yudit by Gaspar Sinai renders any character from Unicode for which a font is available in the system's graphics engine without relying on any third-party application framework library. Instead, it renders glyph contours in Postscript directly from the information in TrueType font files and contains its own layout engine for complex scripts (like Arabic and Devanagari). Therefore, Ed asked me to develop a library that would take the same approach as Yudit but could be used by a program like Madeline.

After some research, we realized that Owen Taylor of Redhat had already developed a Unicode-based complex script layout engine called Pango (from the Greek word "Παν" pan, and Japanese word for language, "語" go) for use in the Gnome desktop project. While developed for Gnome, Pango is not dependent on Gnome. Also, Pango features a modular architecture which permits developers to add layout modules for additional scripts. It thus seemed reasonable to employ Pango on the backend.

In addition, Ed had asked that the library be as transparent as possible when used in his or any other program. Madeline, like many other scientific programs, outputs labels and strings using the Postscript show command. Ed wanted a method that would resemble show as closely as possible, at least to someone looking at the Madeline code.

Strategy

LASi depends on Pango and on Freetype 2. Given a string of Unicode characters, and a font description (e.g. "sans serif"), FreeType selects glyphs from whichever system fonts correspond most closely to the font description and have the required glyphs. Pango then arranges a layout, or placement, of the glyphs on the page that satisfies the grammatical and typographical conventions of the scripts.

For each glyph encountered, LASi generates a Postscript routine that it writes to the document header. Thus, LASi takes over from Postscript the responsibility for layout and rendering of character strings that would be performed by the Postscript show command.

Since the basic Postscript show command is limited to the 255 code points, it is not suitable for rendering the thousands of possible characters present in the numerous scripts already encoded in Unicode.

LASi writes a string of characters as a series of Postscript routine invocations. For example, whereas one would write "hello" in simple Postscript as:


(hello) show

LASi writes:


12 H-Luxi-Sans-Regular-140
12 e-Luxi-Sans-Regular-170
12 l-Luxi-Sans-Regular-177
12 l-Luxi-Sans-Regular-177
12 o-Luxi-Sans-Regular-180

"H-Luxi-Sans-Regular-140" is the name of the routine that draws the glyph for capital-"H" in a "regular" Sans Serif font. The number "140" on the end is used only to help guarantee that each glyph routine's name is unique. The "12" at the beginning of the line is the font size supplied here as an argument to the routine's single parameter.

LASi generates the Postscript routine that renders the glyph for capital-"H" in the document header as:


/H-Luxi-Sans-Regular-140 {
 /myFontsize exch def
 myFontsize 1024 div
 /scalefactor exch def
 gsave
 scalefactor dup scale
 newpath
 86 0 moveto
 196 0 lineto
 196 364 lineto
 576 364 lineto
 576 0 lineto
 685 0 lineto
 685 771 lineto
 576 771 lineto
 576 446 lineto
 196 446 lineto
 196 771 lineto
 86 771 lineto
 86 0 lineto
 fill
 grestore
 scalefactor 770 mul 0 translate
} def

For a detailed understanding of the routine above, refer to the Postscript Language Reference.

This may appear a far cry from the simplicity of the Postscript show command, and one of our goals in developing LASi was to preserve this simplicity. To this end, based on Ed's suggestion, I developed a C++ output stream interface for writing a raw Postscript document.

The class LASi::PostscriptDocument represents the Postscript document. Three methods, LASi::osHeader(), LASi::osBody(), and LASi::osFooter() all return the oPostscriptStream corresponding to each part of the document. A LASi::oPostscriptStream is a specialization (sub-class) of a std::ostream from the C++ Standard Library. LASI::oPostscriptStream behaves just like any instance of std::ostream, but the expression:


 LASi::oPostscriptStream& os = ...;
 os << LASi::show("hello");

will write the sequence of Postscript commands as seen above, instead of the simple (hello) show. The argument to LASi::show() can be any string of Unicode characters encoded in the UTF-8 encoding.

The expression:


 LASi::oPostscriptStream& os = ...;
 os << LASi::setFont("serif") << LASi::setFontSize(12);

causes all subsequent applications of LASi::show() to the stream to generate output in the default serif font face with a font size of 12 points.

The program proceeds to write out all the other postscript commands just as it would do without LASi, but uses the LASi::oPostscriptStream interface instead of writing directly to an output buffer. When finished, a user program needs to call LASi::PostscriptDocument::write(std::ostream&), which effectively closes all three streams, and flushes them, in order, to another stream given as the argument. This new stream is now a valid Postscript document that generates the desired pages.

Thus, LASi acts as a layer between the program and the Postscript document that it is generating, and hides all the complexity of rendering characters from multiple languages and scripts. In my opinion, this makes LASi's public interface an instance of the Facade program design pattern because it hides the complexity of font file selection to cover arbitrary characters from the international Unicode space, glyph rendering, glyph composition, and layout.

Other resources

Trager's Madeline project.
Trager's Linux Unicode Primer page.
The description of LASi on the Eyegene server.
The Unicode Consortium web site.
The FreeType Glyph Conventions documentation.

top