Python Notes

Tuesday, October 05, 2004

Generic template classes in Python

It's been a while since this blog was last updated. And it has a good reason. I've been working with code to allow the expression of complex data structures using Python classes. This project was started as an experiment for data-entry forms, and is now reaching a quite usable state. This is controversial stuff; some people argue that this is only a obfuscation, and that the same results can be achieved with simpler and more traditional approaches. I disagree, but unfortunately, I'm not still able to explain why -- for now, it's just that I 'feel' that this is the correct approach.

At this point, I'm using the basic templating engine for HTML pages, data entry forms, and INI-style configuration files. The following snippet shows how to declare a INI-style configuration file using the templating system:

class CherryPyIni(IniFile):
class server(IniSection):
socketPort = TypedAttribute(8080)
threadPool = TypedAttribute(10)
class staticContent(IniSection):
bitmaps = TypedAttribute('c:/work/sidercom/bitmaps')
class session(IniSection):
storageType = TypedAttribute('ram')

This code reads and writes the following ini file (with the help of the IniFile and IniSection classes, of course):

[server]
socketPort = 8080
threadPool = 10

[staticContent]
bitmaps = c:/work/sidercom/bitmaps

[session]
storageType=ram

The code above (in its full form) extracts its behavior from the structure of the class declaration itself. For example, the class 'knows' the sequence at which declarations must appear on the generated INI file, which is helpful, and better than to have the entries written in random order, as it would appear if a dictionary was used. The TypedAttribute also infers its 'type' from the default value that is passed as a parameter.

After a lot of work, and some dead-ends, I've come to think about this as a generic templating mechanism. The classes declared in such way as to act as templates, to parse, process or transform native Python data representations into other types of representation. Then I hit another stumbling block. But first, some definitions.

Template classes vs Template instances

Template classes are the definitions of the templates themselves. They can be used to build new template instances. As with normal classes and objects, the difference is important, but a few rules must be added to make them comform to the restrictions imposed by Python's syntax and semantics:
  • Rule 1: Template classes may contain any number of nested template classes.
  • Rule 2: Template instances do not contain any nested template classes -- only nested template instances.

The transformation from (1) to (2) is done automatically during the instantiation of the main template class, also known as a container. The __init__ code recursively instantiates all nested classes inside a template class. This is necessary to avoid side effects that would occur as templates instances modify their own member attributes; if one of these members is a class, then the modification would be automatically reflected on the original class itself (because classes are mutable), leading to strange and undesirable side effects. Let us recall the example above to make this point clear:

class CherryPyIni(IniFile):
class server(IniSection):
socketPort = TypedAttribute(8080)
threadPool = TypedAttribute(10)
class staticContent(IniSection):
bitmaps = TypedAttribute('c:/work/sidercom/bitmaps')
class session(IniSection):
storageType = TypedAttribute('ram')

myini = CherryPyIni()

Upon instantiation, the CherryPyIni class will automatically instantiate its nested class members: server, staticContent and session. If they were not instantiated, then any changes done to the myini instance would in fact affect the class declaration. For example:

>>> myini.server.socketPort
8080
>>> myini.server.socketPort = 1234
>>> otherini = CherryPyIni()
>>> otherini.server.socketPort
1234

This is clearly not wanted. New template instances must always be created from a clean sheet, and modifications in one instance are not supposed to change all the others.

Attributes

Not everything inside the template class is another, nested template class. In the example, each nested class has a few TypedAttributes of its own. TypedAttributes are special: they store the default value, and know the datatype that can be stored. This is needed as the INI file is read, to make sure that numeric parameters are automatically converted to the correct type. The attributes include support make automatic instantiation not necessary: they're implemented as data descriptors (also known as properties in other popular languages), which means that they implement the __get__ and __set__ methods and thus can automatically intercept instance-level modifications done at runtime. Besides TypedAttributes, there are also GenericAttributes, that don't do any runtime type checking, conversion, or enforcement. Both types of attributes know the order at which they appear in the class declaration. This information is useful in several applications, even if only for documentation purposes.

Advantages and applications

The system was designed to be very flexible and extensible. Simple attributes are stored as GenericAttributes, or TypedAttributes, depending on the situation. To make things even simpler, basic types such as plain strings or numbers are automatically converted to GenericAttributes (although the order information is lost in this case), which means that some cruft can be removed, making the code even more readable:

class CherryPyIni(IniFile):
class server(IniSection):
socketPort = 8080
threadPool = 10
class staticContent(IniSection):
bitmaps = 'c:/work/sidercom/bitmaps'
class session(IniSection):
storageType = 'ram'

The version above works fine; the difference is that it does not enforce type checking during attribute access as the TypedAttribute does; also, there is no guarantee that the attributes inside each one of the nested classes (server, for example) will be listed in the correct order when the INI file gets written.

The stumbling block

But, as said above, there is a stumbling block. Not all attributes can be represented as simple attributes. In these cases, the use of nested classes is a requirement. This can lead to code that is hard to read. For example, this is a snippet of a form declaration:

class address(Panel):
style = 'form-section'
class address1(EditBox):
caption = 'Address 1'
size = 40
class address2(EditBox):
caption = 'Address 2'
size = 40
class city(EditBox):
caption = 'City'
size = 40

If the number of attributes inside the nested classes is big, the code above can becode really difficult to read -- too long to fit in a page. Of course, one of the advantages of the class declaration system is that inheritance is your friend, and you can always refactor the definition into a sequence of shorter ones. Even so, after testing, I did as follows:

class address(Panel):
style = 'form-section'
address1 = EditBox(caption = 'Address 1', size = 40)
address2 = EditBox(caption = 'Address 2', size = 40)
city = EditBox(caption = 'City', size = 40)

It's much shorter, and quite clear. However, the EditBox function is a hack, and that's what is bothering me.

Why is EditBox a hack? Well -- for all the reasons explained above, a nested class attribute has to be a class, and not an instance. Reading the code above, EditBox looks like a constructor call that returns a EditBox instance -- but it is not. It's a function that builds a new class definition, using the parameters provided as default values for the new class, and returns a class.

I'm not entirely satisfied with this solution, but I'm not still able to substitute it with a better, more generic approach. Right now, for every such class (EditBox, Button, etc) I have to provide a 'class factory' function that builds a new class.
I'm pondering some alternatives:
  • The 'EditBox' class (and all related classes) could be more intelligent. If called from within a class declaration, the constructor would return a new, derived class. If called from within 'normal' code, it would return an instance.
  • Another solution is to substitute nested instances in class declarations for classes. It's the opposite from the above, in a sense. As soon as the metaclass constructor for the container class is called, it searches through the attribute list. If it finds a nested instance that really should be a class, it then builds a new class out of the instance, and drops the instance afterwards.

More stumbling blocks

Did I say that I've found one stumbling block? Well, it seems that stumbling blocks are more common than I had realized. Another interesting one is: how to use a template class with conditionals that will be evaluated at instantiation time? For example, let us assume that I have a form template that will be used in different situations, upon some conditions. The template is the same. Depending upon some parameters, the instance will be generated in a slightly different way. The following code snippet illustrates what I mean:

class MyForm(Form):
...
if can_delete:
bt_delete = Button(caption = 'delete', action = 'delete')

While it seems weird, the code above works. Its only problem is that the test is evaluated only once -- when the class is first evaluated. So, it's not possible to do something like this:

can_delete = True
form_with_delete = MyForm()
can_delete = False
form_without_delete = MyForm()

One way make it sure that the class declaration would be re-run is to put it into a module and reload it; another way is to put it inside a 'exec' statement. But in both cases the hack defeats the purpose of using classes for this type of declaration. This is another issue that I'm not comfortable at all.

160 Comments:

Post a Comment

<< Home