Pymacs - Extending Emacs with Python allout -*- outline -*- Copyright © Progiciels Bourbeau-Pinard inc., Montréal 2003. Title: Pymacs version @value{VERSION} Subtitle: Extending Emacs with Python, updated @value{UPDATED} Author: @value{Francois} Pinard, @email{pinard@@iro.umontreal.ca} * @section Presentation .. @section What is Pymacs? Pymacs is a powerful tool which, once started from Emacs, allows both-way communication between Emacs Lisp and Python. Pymacs aims Python as an extension language for Emacs rather than the other way around, and this assymetry is reflected in some design choices. Within Emacs Lisp code, one may load and use Python modules. Python functions may themselves use Emacs services, and handle Emacs Lisp objects kept in Emacs Lisp space. The goals are to write @emph{naturally} in both languages, debug with ease, fall back gracefully on errors, and allow full cross-recursivity. It is very easy to install Pymacs, as neither Emacs nor Python need to be compiled nor relinked. Emacs merely starts Python as a subprocess, and Pymacs implements a communication protocol between both processes. @url{http://www.iro.umontreal.ca/~pinard/pymacs/} contains a copy of the Pymacs manual file in HTML form. The distribution holds the documentation sources, as well as PostScript and PDF renderings. The canonical Pymacs distribution is available as @url{http://www.iro.umontreal.ca/~pinard/pymacs/Pymacs.tar.gz}. Report problems and suggestions to @email{mailto:pinard@@iro.umontreal.ca}. .. @section Warning to Pymacs users. I expect average Pymacs users to have a deeper knowledge of Python than Emacs Lisp. Some examples at the end of this file are meant for Python users having a limited experience with the Emacs API. Currently, there are only contains two examples, one is too small, the other is too big :-). As there is no dedicated mailing list nor discussion group for Pymacs, let's use @email{python-list@@python.org} for asking questions or discussing Pymacs related matters. This is beta status software: specifications are slightly frozen, yet changes may still happen that would require small adaptations in your code. Report problems to @value{Francois} Pinard at @email{pinard@@iro.umontreal.ca}. For discussing specifications or making suggestions, please also copy the @email{python-list@@python.org} mailing list, to help brain-storming! :-) .. @section History and references. I once starved for a Python-extensible editor, and pondered the idea of dropping Emacs for other avenues, but found nothing much convincing. Moreover, looking at all LISP extensions I wrote for myself, and considering all those superb tools written by others and that became part of my computer life, it would have been a huge undertaking for me to reprogram these all in Python. So, when I began to see that something like Pymacs was possible, I felt strongly motivated! :-) Pymacs revisits previous Cedric Adjih's works about running Python as a process separate from Emacs. See @url{http://www.crepuscule.com/pyemacs/}, or write Cedric at @email{adjih-pam@@crepuscule.com}. Cedric presented @code{pyemacs} to me as a proof of concept. As I simplified that concept a bit, I dropped the @samp{e} in @samp{pyemacs} :-). Cedric also previously wrote patches for linking Python right into XEmacs, but abandoned the idea. Brian McErlean independently and simultaneously wrote a tool similar to this one, we decided to join our projects. Amusing coincidence, he even chose @code{pymacs} as a name. Brian paid good attention to complex details that escaped my courage, so his help and collaboration have been beneficial. You may reach Brian at @email{brianmce@@crosswinds.net}. Another reference of interest is Doug Bagley's shoot out project, which compares the relative speed of many popular languages. The first URL points to the original, the second points to a newer version oriented towards Win32 systems: . : @example @url{http://www.bagley.org/~doug/shootout/} @url{http://dada.perl.it/shootout/index.html} . : * @section Installation. .. @section Install the Pymacs proper. Currently, there are two installation scripts, and both should be run. If you prefer, you may use @samp{make install lispdir=@var{lispdir}}, where @var{lispdir} is some directory along the list kept in your Emacs @code{load-path}. The first installation script installs the Python package, including the Pymacs examples, using the Python standard Distutils tool. Merely @samp{cd} into the Pymacs distribution, then execute @samp{python setup.py install}. To get an option reminder, do @samp{python setup.py install --help}. Check the Distutils documentation if you need more information about this. The second installation script installs the Emacs Lisp part only. (It used to do everything, but is now doomed to disappear completely.) Merely @samp{cd} into the Pymacs distribution, then run @samp{python setup -ie}. This will invite you to interactively confirm which Lisp installation directory, @emph{if} the script discovers more than one reasonable possibility for it. Without @samp{-ie}, the Lisp part of Pymacs will be installed in some automatically guessed place. Use @samp{-n} to known about the guess without proceeding to the actual installation. @samp{./setup -E xemacs @dots{}} may be useful to XEmacs lovers. See @samp{./setup -H} for all options. About Win32 systems, Syver Enstad says: @cite{For Pymacs to operate correctly, one should create a batch file with @file{pymacs-services.bat} as a name, which runs the @file{pymacs-services} script. The @file{.bat} file could be placed along with @file{pymacs-services}, wherever that maybe.}. To check that @file{pymacs.el} is properly installed, start Emacs and give it the command @samp{M-x load-library RET pymacs}: you should not receive any error. To check that @file{pymacs.py} is properly installed, start an interactive Python session and type @samp{from Pymacs import lisp}: you should not receive any error. To check that @file{pymacs-services} is properly installed, type @samp{pymacs-services ' expected.}. Currently, there is only one installed Pymacs example, which comes in two parts: a batch script @file{rebox} and a @code{Pymacs.rebox} module. To check that both are properly installed, type @samp{rebox '). To bind the F1 key to the @code{helper} function in some @code{module}: . : @example lisp.global_set_key((lisp.f1,), lisp.module_helper) . : @samp{(item,)} is a Python tuple yielding an Emacs Lisp vector. @samp{lisp.f1} translates to the Emacs Lisp symbol @code{f1}. So, Python @samp{(lisp.f1,)} is Emacs Lisp @samp{[f1]}. Keys like @samp{[M-f2]} might require some more ingenuity, one may write either @samp{(lisp['M-f2'],)} or @samp{(lisp.M_f2,)} on the Python side. * @section Debugging. .. @section The @code{*Pymacs*} buffer. Emacs and Python are two separate processes (well, each may use more than one process). Pymacs implements a simple communication protocol between both, and does whatever needed so the programmers do not have to worry about details. The main debugging tool is the communication buffer between Emacs and Python, which is named @code{*Pymacs*}. By default, this buffer gets erased before each transaction. To make good debugging use of it, first set @code{pymacs-trace-transit} to either @code{t} or to some @samp{(@var{keep} . @var{limit})}. As it is sometimes helpful to understand the communication protocol, it is briefly explained here, using an artificially complex example to do so. Consider: . : @example (pymacs-eval "lisp('(pymacs-eval \"`2L**111`\")')") "2596148429267413814265248164610048L" . : Here, Emacs asks Python to ask Emacs to ask Python for a simple bignum computation. Note that Emacs does not natively know how to handle big integers, nor has an internal representation for them. This is why I use backticks, so Python returns a string representation of the result, instead of the result itself. Here is a trace for this example. The @samp{<} character flags a message going from Python to Emacs and is followed by an expression written in Emacs Lisp. The @samp{>} character flags a message going from Emacs to Python and is followed by a expression written in Python. The number gives the length of the message. . : @example <22 (pymacs-version "0.3") >49 eval("lisp('(pymacs-eval \"`2L**111`\")')") <25 (pymacs-eval "`2L**111`") >18 eval("`2L**111`") <47 (pymacs-reply "2596148429267413814265248164610048L") >45 reply("2596148429267413814265248164610048L") <47 (pymacs-reply "2596148429267413814265248164610048L") . : Python evaluation is done in the context of the @code{Pymacs.pymacs} module, so for example a mere @code{reply} really means @samp{Pymacs.pymacs.reply}. On the Emacs Lisp side, there is no concept of module namespaces, so we use the @samp{pymacs-} prefix as an attempt to stay clean. Users should ideally refrain from naming their Emacs Lisp objects with a @samp{pymacs-} prefix. @code{reply} and @code{pymacs-reply} are special functions meant to indicate that an expected result is finally transmitted. @code{error} and @code{pymacs-error} are special functions that introduce a string which explains an exception which recently occurred. @code{pymacs-expand} is a special function implementing the @samp{copy()} methods of Emacs Lisp handles or symbols. In all other cases, the expression is a request for the other side, that request stacks until a corresponding reply is received. Part of the protocol manages memory, and this management generates some extra-noise in the @code{*Pymacs*} buffer. Whenever Emacs passes a structure to Python, an extra pointer is generated on the Emacs side to inhibit garbage collection by Emacs. Python garbage collector detects when the received structure is no longer needed on the Python side, at which time the next communication will tell Emacs to remove the extra pointer. It works symmetrically as well, that is, whenever Python passes a structure to Emacs, an extra Python reference is generated to inhibit garbage collection on the Python side. Emacs garbage collector detects when the received structure is no longer needed on the Emacs side, after which Python will be told to remove the extra reference. For efficiency, those allocation-related messages are delayed, merged and batched together within the next communication having another purpose. .. @section Emacs usual debugging. If cross-calls between Emacs Lisp and Python nest deeply, an error will raise successive exceptions alternatively on both sides as requests unstack, and the diagnostic gets transmitted back and forth, slightly growing as we go. So, errors will eventually be reported by Emacs. I made no kind of effort to transmit the Emacs Lisp backtrace on the Python side, as I do not see a purpose for it: all debugging is done within Emacs windows anyway. On recent Emacses, the Python backtrace gets displayed in the mini-buffer, and the Emacs Lisp backtrace is simultaneously shown in the @code{*Backtrace*} window. One useful thing is to allow to mini-buffer to grow big, so it has more chance to fully contain the Python backtrace, the last lines of which are often especially useful. Here, I use: . : @example (setq resize-mini-windows t max-mini-window-height .85) . : in my @file{.emacs} file, so the mini-buffer may use 85% of the screen, and quickly shrinks when fewer lines are needed. The mini-buffer contents disappear at the next keystroke, but you can recover the Python backtrace by looking at the end of the @code{*Messages*} buffer. In which case the @code{ffap} package in Emacs may be yet another friend! From the @code{*Messages*} buffer, once @code{ffap} activated, merely put the cursor on the file name of a Python module from the backtrace, and @samp{C-x C-f RET} will quickly open that source for you. .. @section Auto-reloading on save. I found useful to automatically @code{pymacs-load} some Python files whenever they get saved from Emacs. Here is how I do it. The code below assumes that Python files meant for Pymacs are kept in @file{~/share/emacs/python}. . : @example (defun fp-maybe-pymacs-reload () (let ((pymacsdir (expand-file-name "~/share/emacs/python/"))) (when (and (string-equal (file-name-directory buffer-file-name) pymacsdir) (string-match "\\.py\\'" buffer-file-name)) (pymacs-load (substring buffer-file-name 0 -3))))) (add-hook 'after-save-hook 'fp-maybe-pymacs-reload) . : * @section Examples. .. @section Example 1 --- Paul Winkler's. . : @section Example 1 --- The problem. Let's say I have a a module, call it @file{manglers.py}, containing this simple python function: . , @example def break_on_whitespace(some_string): words = some_string.split() return '\n'.join(words) . , The goal is telling Emacs about this function so that I can call it on a region of text and replace the region with the result of the call. And bind this action to a key, of course, let's say @code{[f7]}. The Emacs buffer ought to be handled in some way. If this is not on the Emacs Lisp side, it has to be on the Python side, but we cannot escape handling the buffer. So, there is an equilibrium in the work to do for the user, that could be displaced towards Emacs Lisp or towards Python. . : @section Example 1 --- Python side. Here is a first draft for the Python side of the problem: . , @example from Pymacs import lisp def break_on_whitespace(): start = lisp.point() end = lisp.mark(lisp.t) if start > end: start, end = end, start text = lisp.buffer_substring(start, end) words = text.split() replacement = '\n'.join(words) lisp.delete_region(start, end) lisp.insert(replacement) interactions = @{break_on_whitespace: ''@} . , For various stylistic reasons, this could be rewritten into: . , @example from Pymacs import lisp interactions = @{@} def break_on_whitespace(): start, end = lisp.point(), lisp.mark(lisp.t) words = lisp.buffer_substring(start, end).split() lisp.delete_region(start, end) lisp.insert('\n'.join(words)) interactions[break_on_whitespace] = '' . , The above relies, in particular, on the fact that for those Emacs Lisp functions used here, @samp{start} and @samp{end} may be given in any order. . : @section Example 1 --- Emacs side. On the Emacs side, one would do: . , @example (pymacs-load "manglers") (global-set-key [f7] 'manglers-break-on-whitespace) . , .. @section Example 2 --- Yet another Gnus backend. @strong{Note.} This example is not fully documented yet. As it stands, it is merely a collection of random remarks from other sources. . : @section Example 2 --- The problem. I've been reading, saving and otherwise handling electronic mail from within Emacs for a lot of years, even before Gnus existed. The preferred Emacs archiving disk format for email is Babyl storage, and the special @code{Rmail} mode in Emacs handles Babyl files. With years passing, I got dozens, then hundreds, then thousands of such Babyl files, each of which holds from as little as only one to maybe a few hundreds individual messages. I tried to taylor @code{Rmail} mode in various ways to MIME, foreign charsets, and many other nitty-gritty habits. One of these habits was to progressively eradicate paragraphs in messages I was visiting many times, as users were often using a single message to report many problems or suggestions all at once, while I was often addressing issues one at a time. When I took maintenance of some popular packages, like GNU @code{tar}, my volume of daily email raised drastically, and I choose Gnus as a way to sustain the heavy load. I thought about converting all my Babyl files to @code{nnml} format, but this would mean loosing many tools I wrote for Babyl files, consuming a lot of i-nodes, and also much polluting my @code{*Group*} buffer. I rather chose to select and read Babyl files as ephemeral mail groups (and for doing so, developed Emacs user machinery so selection could be done very efficiently). Gnus surely gave me for free nice MIME and cryptographic features, and a flurry of handsome and useful commands, compared to previous @code{Rmail} mode. On the other hand, Gnus did not allow me to modify invidual messages in Babyl files, so for a good while, I had to give up on some special handling, like eradicating paragraphs as I used to do. This pushed me into writing my own Gnus backend for Babyl files: making sure I correctly implement the article editing and modification support of the backend API. I chose Python to do so because I already had various Python tools for handling Babyl files, because I wanted to connect other Python scripts to the common mechanics, and of course because Pymacs was making this project feasable. Nowadays, Babyl file support does not go very far beyond Emacs itself, while many non-Emacs tools for handling Unix mailbox folders are available. Spam fighting concerns brought me to revisit the idea of massively transforming all my Babyl files to Unix mailbox format, and I discovered that it would be a breeze to do, if I only adapted the Python backend to handle Unix mailbox files as well as Babyl, transparently. . : @section Example 2 --- Python side. I started by taking the Info nodes of the Gnus manual which were describing the back end interface, and turning them all into a long Python comment. I then split that comment into one dummy function per back end interface function, meant to write some debugging information when called, and then return failure to Gnus. This was enough to explore what functions were needed, and in which circumstances. I then implemented enough of them so ephemeral Babyl groups work, while solid groups might require more such functions. The unimplemented functions are still sitting in the module, with their included comments and debugging code. . : @section Example 2 --- Emacs side. One difficulty is ensuring that @code{Nn} contents (@file{nncourrier.py} and @file{folder.py}) have to be on the Python or Pymacs search path. The @file{__init__.py} and package nature are not essential. .. @section Example 3 --- The @code{rebox} tool. . : @section Example 3 --- The problem. For comments held within boxes, it is painful to fill paragraphs, while stretching or shrinking the surrounding box @emph{by hand}, as needed. This piece of Python code eases my life on this. It may be used interactively from within Emacs through the Pymacs interface, or in batch as a script which filters a single region to be reformatted. In batch, the reboxing is driven by command options and arguments and expects a complete, self-contained boxed comment from a file. Emacs function @code{rebox-region} also presumes that the region encloses a single boxed comment. Emacs @code{rebox-comment} is different, as it has to chase itself the extent of the surrounding boxed comment. . : @section Example 3 --- Python side. The Python code is too big to be inserted in this documentation: see file @file{Pymacs/rebox.py} in the Pymacs distribution. You will observe in the code that Pymacs specific features are used exclusively from within the @code{pymacs_load_hook} function and the @code{Emacs_Rebox} class. In batch mode, @code{Pymacs} is not even imported. Here, we mean to discuss some of the design choices in the context of Pymacs. In batch mode, as well as with @code{rebox-region}, the text to handle is turned over to Python, and fully processed in Python, with practically no Pymacs interaction while the work gets done. On the other hand, @code{rebox-comment} is rather Pymacs intensive: the comment boundaries are chased right from the Emacs buffer, as directed by the function @code{Emacs_Rebox.find_comment}. Once the boundaries are found, the remainder of the work is essentially done on the Python side. Once the boxed comment has been reformatted in Python, the old comment is removed in a single delete operation, the new comment is inserted in a second operation, this occurs in @code{Emacs_Rebox.process_emacs_region}. But by doing so, if point was within the boxed comment before the reformatting, its precise position is lost. To well preserve point, Python might have driven all reformatting details directly in the Emacs buffer. We really preferred doing it all on the Python side: as we gain legibility by expressing the algorithms in pure Python, the same Python code may be used in batch or interactively, and we avoid the slowdown that would result from heavy use of Emacs services. To avoid completely loosing point, I kludged a @code{Marker} class, which goal is to estimate the new value of point from the old. Reformatting may change the amount of white space, and either delete or insert an arbitrary number characters meant to draw the box. The idea is to initially count the number of characters between the beginning of the region and point, while ignoring any problematic character. Once the comment has been reboxed, point is advanced from the beginning of the region until we get the same count of characters, skipping all problematic characters. This @code{Marker} class works fully on the Python side, it does not involve Pymacs at all, but it does solve a problem that resulted from my choice of keeping the data on the Python side instead of handling it directly in the Emacs buffer. We want a comment reformatting to appear as a single operation, in the context of Emacs Undo. The method @code{Emacs_Rebox.clean_undo_after} handles the general case for this. Not that we do so much in practice: a reformatting implies one @code{delete-region} and one @code{insert}, and maybe some other little adjustements at @code{Emacs_Rebox.find_comment} time. Even if this method scans and mofifies an Emacs Lisp list directly in the Emacs memory, the code doing this stays neat and legible. However, I found out that the undo list may grow quickly when the Emacs buffer use markers, with the consequence of making this routine so Pymacs intensive that most of the CPU is spent there. I rewrote that routine in Emacs Lisp so it executes in a single Pymacs interaction. Function @code{Emacs_Rebox.remainder_of_line} could have been written in Python, but it was probably not worth going away from this one-liner in Emacs Lisp. Also, given this routine is often called by @code{find_comment}, a few Pymacs protocol interactions are spared this way. This function is useful when there is a need to apply a regexp already compiled on the Python side, it is probably better fetching the line from Emacs and do the pattern match on the Python side, than transmitting the source of the regexp to Emacs for it to compile and apply it. For refilling, I could have either used the refill algorithm built within in Emacs, programmed a new one in Python, or relied on Ross Paterson's @code{fmt}, distributed by GNU and available on most Linuxes. In fact, @code{refill_lines} prefers the latter. My own Emacs setup is such that the built-in refill algorithm is @emph{already} overridden by GNU @code{fmt}, and it really does a much better job. Experience taught me that calling an external program is fast enough to be very bearable, even interactively. If Python called Emacs to do the refilling, Emacs would itself call GNU @code{fmt} in my case, I preferred that Python calls GNU @code{fmt} directly. I could have reprogrammed GNU @code{fmt} in Python. Despite interesting, this is an uneasy project: @code{fmt} implements the Knuth refilling algorithm, which depends on dynamic programming techniques; Ross did carefully fine tune them, and took care of many details. If GNU @code{fmt} fails, for not being available, say, @code{refill_lines} falls back on a dumb refilling algorithm, which is better than none. . : @section Example 3 --- Emacs side. The Emacs recipe appears under the @samp{Emacs usage} section, near the beginning of @file{Pymacs/rebox.py}, so I do not repeat it here.