I wrote the first two white papers soon after discovering Python. The conclusions in these papers have remained largely unchanged. I wrote the third in November 2004, and rewrote it in February 2006.
Contents
The more I look at Tk, the more convinced I am that Python + Tk (aka Tkinter) is, by far, the best way to go with Leo. I now have Open Source code for tree widgets and splitter windows, and have intensely studied how to modify that code for use in Leo. It is clear, even at this early date, that this code will provide a very pleasant base on which to build Leo.
The tree code is based on code in IDLE, the Python IDE. This code is simple, good and plenty fast enough. The tree code draws directly to a Tk canvas object. The look and feel matches Windows exactly. It would be trivial to use Mac triangle icons instead of the Windows plus and minus icons. It would also be trivial to modify the look and feel for Linux.
The tree widget code solves several intractable problems with wxTreeCtrl. Moving nodes becomes trivial. Bugs in wxTreeCtrl involving editing and redrawing disappear. Using Python/Tk code simplifies the vnode class, and having access to the vnode class simplifies and speeds up the tree widget code. It will now be possible to bind keystrokes properly; this simply can not be done in wxWindows. The tree widget code shows just how trivial the Windows native tree control is. The Tk canvas is a splendid example of higher-level code being superior, in every way, to lower level code.
Another big win comes from using the Tk text widget. This widget is extraordinarily powerful. The only text control that rivals it is the MacOS/Yellow Box text control. Indeed, the Tk text widget does everything that Leo could possibly want. One can even embed images in text.
In short, using Tk for Leo will be fast enough and will greatly increase what is possible in Leo while at the same time greatly simplifying Leo's code. I am about to convert Leo from wxPython to Python + Tk. Edward K. Ream, November 4, 2001
I've known for a while that Python was interesting; I attended a Python conference last year and added Python support to Leo. But last week I got that Python is something truly remarkable. I wanted to convert Leo from wxWindows to wxPython, so I began work on c2py, a Python script that would help convert from C++ syntax to Python. While doing so, I had an Aha experience. Python is more than an incremental improvement over Smalltalk or C++ or objective-C; it is "something completely different". The rest of this post tries to explain this difference.
What struck me first as I converted C++ code to Python is how much less blah, blah, blah there is in Python. No braces, no stupid semicolons and most importantly, no declarations. No more pointless distinctions between const, char *, char const *, char * and wxString. No more wondering whether a variable should be signed, unsigned, short or long.
Declarations add clutter, declarations are never obviously right and declarations don't prevent memory allocation tragedies. Declarations also hinder prototyping. In C++, if I change the type of something I must change all related declarations; this can be a huge and dangerous task. With Python, I can change the type of an object without changing the code at all! It's no accident that Leo's new log pane was created first in Python.
Functions returning tuples are a "minor" feature with a huge impact on code clarity. No more passing pointers to data, no more defining (and allocating and deallocating) temporary structs to hold multiple values.
Python can't check declarations because there aren't any. However, there is a really nifty tool called Pychecker that does many of the checks typically done by compilers. See pychecker for details.
Python is much more powerful than C++, not because Python has more features, but because Python needs less features. Some examples:
Before using Python I never fully realized how difficult and dangerous memory allocation is in C++. Try doing:
aList[i:j] = list(aString)
in C. You will write about 20 lines of C code. Any error in this code will create a memory allocation crash or leak.
Python is fundamentally safe. C++ is fundamentally unsafe. When I am using Python I am free from worry and anxiety. When I am using C++ I must be constantly "on guard." A momentary lapse can create a hard-to-find pointer bug. With Python, almost nothing serious can ever go wrong, so I can work late at night, or after a beer. The Python debugger is always available. If an exception occurs, the debugger/interpreter tells me just what went wrong. I don't have to plan a debugging strategy! Finally, Python recovers from exceptions, so Leo can keep right on going even after a crash!
Python has almost all the speed of C. Other interpretive environments such as icon and Smalltalk have clarity, power and safety similar to Python. What makes Python unique is its seamless way of making C code look like Python code. Python executes at essentially the speed of C code because most Python modules are written in C. The overhead in calling such modules is negligible. Moreover, if code is too slow, one can always create a C module to do the job.
In fact, Python encourages optimization by moving to higher levels of expression. For example, Leo's Open command reads an XML file. If this command is too slow I can use Python's XML parser module. This will speed up Leo while at the same time raising the level of the code.
Little of Python is completely new. What stands out is the superb engineering judgment evident in Python's design. Python is extremely powerful, yet small, simple and elegant. Python allows me to express my intentions clearly and at the highest possible level.
The only hope of making Leo all it can be is to use the best possible tools. I believe Python (possibly with Tkinter) will allow me to add, at long last, the new features that Leo should have.
Edward K. Ream, October 25, 2001. P.S., September, 2005:
Four years of experience have only added to my admiration for Python. Leo could not possibly be what it is today without Python.
This white paper describes the storage allocation used in a commercial optimizing C compiler written for Tuple, Inc. ca. 1993. The last section discusses tantalizing possibilities for the pypy project. These possibilities are why I wrote this paper.
Storage allocation is crucial to any compiler because of the number, size and complexity of data which must be allocated. You might event say that a compiler consists of storage allocation and everything else. I paid a lot of attention to storage allocation in CC2, and that work paid off. The resulting compiler was a few percent faster than the CodeWarrior C compiler, perhaps the fastest C compiler in existence at that time. The original notes were written around 1993, so I would do some things differently today. However, the design discussed below still seems relevant today. Indeed, attention to low-level details can make a huge difference.
CC2 allocated objects one-at-a-time (simple allocation), in blocks of fixed-sized objects (block allocation) or in blocks of variable-sized objects (lifetime-based allocation). Simple-allocation was used for short-lived objects and will not be discussed further. Block allocation was used in several ways, the most interesting of which was to allocate "cons cells" used to represent lists. These cons cells could be reused by placing them on a global avail list. Thus, the blocks holding cons cells could (and did) have permanent lifetime: they were never deallocated.
The most interesting aspect of CC2's storage allocation scheme was what I eventually came to call lifetime-based storage allocation. This was, for me, an new discovery, though clearly I was not the first to discover it. The Aha is that a lifetime is defined by the time (or equivalently by the place in program code) at which objects are deallocated. A lifetime may hold many different kinds of objects that were allocated at many different times. The essence of lifetime-based allocation is that the proper time to specify when an object will be deallocated is when the object is created.
Lifetime-based allocation is a superb invention:
It becomes an effective design tool. Thinking of objects in terms of their lifetimes often shows the essence of a design problem.
It can be used for any kind of object, regardless of its size or type.
It reduces the number of calls to calloc and free by one or two orders of magnitude over naive schemes. Furthermore, it typically requests standard-sized blocks (say 8K or more) from calloc, further easing the burden on calloc and free.
CC2 could allocate objects with a particular lifetime using small, extremely fast, macros. The macros expanded to C code something like this:
if lifetime -> avail >= sizeof(theObjectKind) { // Allocate the storage. theObject = lifetime -> ptr lifetime -> ptr += sizeof(theObjectKind) } else { << allocate theObject in another block >> }
Importantly, the speed of the else clause makes absolutely no difference because it is so seldom executed. Thus, it uses a function call. Clearly then, this code is optimal: it could not be improved even if coded in assembly language.
What makes lifetime-based allocation so powerful is that so few lifetimes are typically required. Indeed, CC2 had only the following lifetimes:
That's all. Similarly, if Leo were recast as a C program only the following lifetimes would be needed:
The remarkable thing about dynamic languages like Python is how often objects can, in fact, be assigned static lifetimes.
Lifetime allocation isn't used in Java, Python, etc. because these languages have no way of knowing (in general) what the lifetime of any object will be. Furthermore, translating Python to C would be straightforward were it not for storage allocation issues. For example, most of Leo's code could easily be translated into C, provided that the lifetime of all objects were known. Again, just for example, the prospect of translating the Python version of Leo to a fully optimized C version is tantalizing.
This is where pypy comes in: its extensive flow analysis may be sufficient to discover lifetimes for a significant percentage of objects. Perhaps user hints may be effective. For example, pypy offers the chance to make something like region inference truly useful. Note that the pypy project might benefit from deducing lifetimes even if not all objects could be assigned a static lifetime. Another reason why lifetimes are not a standard technique is that they are a potentially dangerous optimization. Errors in specifying lifetimes will result in dangling pointer references. But this danger might disappear if pypy could deduce lifetimes automatically.