Monday, December 15, 2008

Program verifiers are complex

I've recently found out that epitome is not only a Romanian word but also an English one. The discovery deserves a blog post.

Programs instruct (virtual) machines what to do. Programs tell type-checkers what to check. Programs can also tell program verifiers what to check. A (static) program verifier is a little like a type-checker on steroids. Unlike type-checkers, they tend to be complex beasts. ACL2, Jahob, and Spec#, all tame the complexity in different ways.

The approach in Jahob is code reuse. The developers chose to rely heavily on the work of others and innovated mainly in the area of "putting things together".

Spec# divides and conquers, much like Microsoft does in general. They broke down the process of verification into multiple simpler phases and built probably the best engineered program verifier.

The old ACL2 encodes complexity in data rather than code. More specifically Moore (yes, the same guy who did the string matching algorithm) notes that there is no need for verification condition generation if you simply tell a prover the operational semantics of the language you are processing.

It's funny how the oldest tool has the slickest approach (and, probably, the smallest number of developers). Here are some references:

Labels: , , ,

Tuesday, December 02, 2008

Data Structures in Imperative Languages

So in the past few weeks I have realized I don't really know how to write data structures in Java or in any other imperative language for that matter.

The issue is that once one wants follow certain patterns like visitors or immutable classes, lot of code has to be written by hand in a tedious and repetitive fashion.

The reason is that these design decisions affect all the data structures in a similar way: a visitor pattern requires an accept method for each class; equality, hash code, cloning have the same properties over all classes (such as deep cloning); etc.

Code generation does seem like a good path to follow here.

Thanks to a colleague who is an avid UML modeler I had the opportunity to work with classes generated via the Eclipse Modeling Framework (EMF).

So how did I like it? Not too much. There's almost no control over the code being generated as the template is well embedded in the innards of the framework.

No equals method, no hashCode method, no make methods (which create a class from the given attributes). All have to be written by hand - get me out of here!

Hence I went and looked for other technologies.

A colleague of mine Radu developed a light-weight tool for the purpose in question. The class diagram is specified in a custom language similar to grammar description languages. For instance:

BinaryOperator :> Implies, Iff;

tells us that the classes Implies and Iff extend the class BinaryOperator. The code is generated using templates, which are language-ignorant and may access properties of the classes using special keywords.

For instance, \ClassName is expanded to the name of currently processed class and \BaseName to the name of the base class. This enables us to write templates like this:

public class \ClassName extends \BaseName { ....

More details about Radu's tool can be found in this article.

Radu's approach is nice, concise, and more flexible than EMF. The problem is, however, the template language. Very soon I have discovered that there are properties of the classes that are hard to access. Moreover, it's not quite clear how one extends the generated classes with custom code.

Even though tempted to extend the existing tool with constructs of my desire, I decided to look for other existing approaches based on code generation using templates.

Surprisingly enough, little did I find. The most promising is described in a tutorial using the velocity tool. The tutorial can be found here.

The template language is richer, it supports if statements, variables, and other goodies. Theinclude construct, I believe, could be used to extend classes with custom code.

The language is a bit awkward at times, this could probably be overcome by defining some velocity variables. It seems that one would have to write support for navigating the structures describing the class diagrams one wants to support.

All in all, I have a nagging, ominous feeling that I'm missing something. Is it not the case that pretty much every object orient program needs data structures? Isn't there like million papers on UML and/or code generation?

Could someone please point me to a decent tool that would let me generate data structure code using templates driven by class diagrams?

Labels: ,