Python Notes

Wednesday, September 01, 2004

Python ORM tools

An object relational mapper, or ORM, is a piece of software that sits between the object model used by an application and the relational model used by a conventional RDBMS system. It's a useful piece of software, as it allows one to blend the best of both worlds. The application can be written in terms of objects, and the database can be managed using conventional tools.

A little bit of history

I have very little experience with real-world ORMs, but the concept itself is not new for me. I used to write business applications a long time ago (circa 1990), long before RDMBS became popular in the PC world. In those times we used to rely on flat file formats and external index libraries, such as the Turbo Power - a set of Pascal libraries to support B-Tree indexes and mergesort methods. When Turbo Pascal 5.0 (and shortly after, 5.5) arrived with the notion of objects, I wrote a small class library of my own to encapsulate records as objects. All records were descendant of TRecord, which declared a few abstract methods to load and save records to the database, and to manage the keys used in the index files. Its descendants had to declare real methods. Turbo Pascal didn't allow for the type of magic that dynamic languages such as Python do, so a lot of it was repretitive hand-crafted work. Even parts of code that were seemingly similar in structure were hard to automate at that time.

Enter Python. As a dynamic language, Python makes easy to write a lot of the glue code that is needed to translate between the OO and the RDBMS worlds. Python ORMs take advantage of this to allow for simpler use.

What is a ORM good for?

Well, now we know what a ORM is, and how did we arrive here today. But why should one use a ORM? Well, implementing a business application using SQL can be a tedious task. Records in SQL are simple tuples of data. If you want to use this data in a object-oriented fashion, you have to convert this representation to an object, and vice-versa. SQL is also based on sets of data. It means that there is not such a concept as working with individual records in SQL. Another issue is that any changes to the object model used by the application imply equivalent changes both to the relational database model, and to the code used in the adaptation layer. If this code is spread over the application, the process will not only be tedious, but also error prone. Due to the nature of SQL (and Python's, to a certain extent), most errors will only get caught at runtime, which is too late for most situations.

An ORM can solve all these problems, and add a few niceties of its own. In the best case, it will allow for a true object persistence model. Good ORMs also allow changes in the object model to be automatically mapped to the relational one. The opposite is also true, albeit less useful, assuming that you will be writing object-oriented code. In the best scenario, the application code can be completely ridden from any SQL reference, allowing the full use of an object-oriented paradigm for its development.

Challenges in ORM design

This short document doesn't intend to be a treaty on how ORMs are designed. However, some of the pitfalls are important and obvious enough to deserve special treatment.

The basic challenge in ORM design is how to map the OO and relational models seamingless. Good ORM tools will run as transparently as possible, allowing the programmer to forget the intrincacies of the relational model. The process may start with an representation of the object model, either graphical or textual, that can be compiled to generate the classes as needed. Its also possible to declare the object model in pure Python, counting on the ORM tool to generate and handle the glue code automagically.

To implement true object persistence, each object needs a unique identifier. Each ORM will use a different approach here. However, one issue that is common to all ORMs is how to handle multiple copies of the same object. For the sake of consistence, objects should be singletons. If the ORM tool doesn't do it automatically, then it's up to the programmer to make sure that no multiple copies of the same object exist at the same time, avoiding concurrency and consistency issues.

Last, we have the age old tradeoff: ease of use versus power. Simpler ORMs are lighter, generally easier to learn, but may fall short on the amount of automation allowed. More powerful ORMs are the opposite -- they fully support all expected functionality, but may be too complex or heavyweight for simpler applications. In any case, ORMs may require a too strict approach that may not fit the programmer's mental model, so chosing the better one is as much a matter of choice as it is a matter of intrinsic quality.

Some Python ORM tools

The current state of ORM tools for Python is difficult to assess at first glance. There are many tools available, but some are old, not maintained, or not widely supported. I've collected a few links and my impressions as follows.

PDO

PDO is not a full fledged ORM. It's rather a lightweight object-oriented layer for the DB-API. It adds a few helper methods and accommodate the slight differences between the available database drivers within a single framework, making the transition easier. However, it falls really short of a true ORM. There is no automatic mapping of the database rows to objects, and the database structure has to be created and managed manually. It would probably be better to treat PDO as part of the underlying DB-API, but that's out of the scope of this document.

SQLObject

SQLObject is a very pythonic approach to ORM. Objects are declared in Python using a very simple syntax, and a metaclasses-based approach makes the translation between the objects and the database representation fully transparent. Each record in the database is represented by a object, generated on the fly according to the defined structure. Fields are automatically mapped to attributes. Joins and relationships -- including many to many -- are also implemented in a fairly straighforward way. Each object is uniquely associated with a row in the database, guaranteeing consistency. And better -- everything is done at run time, with no need for intermediate steps or external support tools.

The approach used by SQLObject is very easy to understand and use in actual production. A few things are surprisingly easy; for instance, many-to-many relationships normally resort to an intermediate table to store the relations; SQLObject manages this intermediate table automatically, and the user may never need to know that it exists. Creating new tables is also a breeze. On the other hand, some of the more complex relational constructs are not directly supported, such as left joins. The primary key is always an integer (actually, there are a few workarounds, but there are compelling reasons to use it as-is). There are also a few quirks; for example, being as pythonic as it is, one is tempted to declare inherited object types. In relational database design, the descendant is usually a table that implements only the additional fields, and store the ancestor fields in the original table; a relationship is used to form the full record. But due to the way database entities are mapped to objects, the end result is not a true inherited related database design; fields are duplicated on the descendant object, making it unusable as a means of database modeling. Transaction support is also missing.

Summary. SQLObject is a fantastic tool for simple database applications. It helps a big part of the problem for small apps -- the management overhead of the database itself for some repetitive tasks. For complex ones, it still does a good job, but if you push its limits, then you have to start doing manual database coding, with little help of the framework.

Modeling

Modeling is an ambitious project that aims to write a port of Apple's Enterprise Object Framework (abbreviated henceforth EOF) in pure Python. The author had extensive previous experience using the EOF, and decided to write a port after missing the convenience that a good ORM brings to programming. The EOF uses the Entity-Relationship Model [1], which provides the academic foundation for this type of work.

The framework can be broadly split in two parts. The design part is a relational database modeling tool that generates the object-oriented abstraction layer. The runtime code is an object-oriented framework that implements object persistency and consistency.

The design tools accepts models in both Python or XML format. The former allows the schema to be written in a conventional text editor. The later works well with interactive tools that generate the entity description. One such a tool, ZModelizationTool (for Zope), is provided as part of the package. The configuration allows the full specification of relationships and joins. Solid knowledge of relational database theory is required to take full advantage of the design tool. Besides that, default values are handled nicely, making global changes relatively easy. The following database adapters are supported, and provided as part of the package: MySQL, Oracle, PostgreSQL and SQLite.

Once processed, the schema is converted into a package of Python classes. Each entity is mapped to a class that inherits its behavior from a common CustomObject ancestor. This hierarchy allows hooks for actions such as data validation. But the best part of the framework is the EditingContext class. It creates a graph of all object instances in memory, guaranteeing consistency. Each row in the underlying dataset is uniquely mapped to an object. All relationships are also mapped into the EditingContext. One can think about it as a object oriented database state map -- in a sense, the equivalent, in object oriented terms, to a standard database connection.

Summary. Modeling is a well structured tool, based on solid and well tested concepts. The documentation of this project is well structured and professional looking, but still incomplete at parts, and the diagrams still need some polishment. Although ambitious, the project is focused and well managed. It's a powerful tool, and some careful code tuning is needed to take full advantage of its power. An 1.0 release will be most welcome.

Conclusion

Choosing a ORM is a integral part of the design choices that have to be done at the start of the development cycle. In a sense, it should be easier to change the back-end database than to change ORMs -- assuming all databases support the same API (which is not actually true, but it's a good aproximation for this purpose).

SQLObject is by far the easiest and more pythonic of all the approaches we evaluated so far. Its power comes from the fact that it makes routine tasks vary simple -- in some cases, fully automatic and transparent. Its programming model is really easy and non intrusive. Its shortcomings should not be evident for most applications. Modeling, on the other hand, is more like a full fledged framework for database applications. The EditingContext concept is great, and would be in fact a more than welcome addition to the SQLObjects model. For my own applications, SQLObjects seem to be better suited - it is simpler, easier to learn, and cleaner than Modeling. But I'm sure that Modeling has its place for more complex apps.

32 Comments:

Post a Comment

<< Home