The Expression Problem and Tables

The Expression Problem

The expression problem is a well known problem in programming language theory dealing with the expressivity of a programming language. In this post I’ll examine its impact on functional and object-oriented code, then talk about solving the problem in a language with tables as a programmable entity.

Say you have some data, like Animals. And some actions that they can perform. We want to implement both actions for both animals.

catfox
sound{…}{…}
eat{…}{…}

The expression problem asks what facilities the language provides to deal with this table. Here’s Philip Wadler’s original statement of the problem.

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

Let’s take a look at two how the object-oriented and functional styles cope with the expression problem.

Object-Oriented

Object-oriented languages primarily look at the data (“objects”). We say that a cat can eat or a fox can make a sound.

class Cat {
    def sound {...}
    def eat {...}
}

class Fox {
    def sound {...}
    def eat {...}
}

We’re implicitly filling in columns of a table.

class Cat {class Fox {
}}
sound
    def sound {...}
    def sound {...}
eat
    def eat {...}
    def eat {...}

It’s relatively easy to add a new datatype (ie a column). It’s all contained in a new class declaration.

class Bat {
    def sound {...}
    def eat {...}
}
class Cat {class Fox {class Bat {
}}}
sound
    def sound {...}
    def sound {...}
    def sound {...}
eat
    def eat {...}
    def eat {...}
    def eat {...}

To add a new action (ie a row) is more difficult. We need to open up all the objects (columns) to modify them.

class Cat {class Fox {
}}
sound
    def sound {...}
    def sound {...}
eat
    def eat {...}
    def eat {...}
attack
    def attack {...}
    def attack {...}

Not a big deal when there are only two classes, but the point is we need to find and modify each class implementing this interface. The implementations are spread across space, likely in different files. The difficulty is not in the amount of code being written, but how many distinct places one must edit.

Perhaps more importantly, since we spend more time reading than writing code, how many places must you look to understand a concept. If I want to to understand what it means to be a Cat, I look at the Cat class. But there’s no single place where I can understand what it means to eat.

Functional

Functional languages start from functions. sound applies to cats and foxes.

data Animal = Cat | Fox | Bat

sound :: Animal -> Sound
sound Cat = {...}
sound Fox = {...}
sound Bat = {...}

eat :: Animal -> Action
eat = {...}
CatFox
soundCat -> {...}Fox -> {...}
eatCat -> {...}Fox -> {...}

It’s relatively easy to add a new function.

attack :: Animal -> Action
attack Cat = {...}
attack Fox = {...}
attack Bat = {...}
CatFox
soundCat -> {}Fox -> {}
eatCat -> {}Fox -> {}
attackCat -> {}Fox -> {}

But it’s difficult to add a new data constructor.

data Animal = Cat | Fox | Bat | Dog

sound :: Animal -> Sound
sound Cat = {...}
sound Fox = {...}
sound Bat = {...}
sound Dog = {...}

eat = {...}
CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}

We need to open up the implementation of every function dealing with the Animal datatype. Which, again, may be spread across different files.

Both approaches work well only half the time when editing objects and actions they perform.

Tables

In both the object-oriented and functional code, we’re implicitly filling the cells in a table. An object-oriented approach tends to focus on columns, while a fuctional approach tends to focus on rows.

Imagine using tables as a first-class programming interface, where you can scan the rows and columns or add new ones. You become empowered to see the table from the point of view of either the functional or object-oriented style, depending on what you’re trying to understand.

CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}
attackCat -> {...}Fox -> {...}Bat -> {...}

Aside: It's very important to understand that I'm not proposing to program this way in text files. It would be far too much work to maintain table layout manually. Rather, this type of editing can only be done in a richer environment.

Additionally, by treating the table as an object in our programming environment, we gain the power to query:

> animals which eat
CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}
attackCat -> {...}Fox -> {...}Bat -> {...}
> unimplemented cases
CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}
attackCat -> {...}Fox -> {...}Bat -> {...}

Richer Queries

To make this more interesting, say our environment also provides queryable metadata. It could conceivable provide info on:

  • who wrote a datatype or function
  • when it was first written and last modified
  • usage information
  • testing information

Then you could query interesting things, like:

> functions written by me,
  with no tests,
  that have thrown an exception in production
CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}
attackCat -> {...}Fox -> {...}Bat -> {...}
> implementations calling a deprecated function
CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}
attackCat -> {...}Fox -> {...}Bat -> {...}
> implementations marked TODO
CatFoxBat
soundCat -> {...}Fox -> {...}Bat -> {...}
eatCat -> {...}Fox -> {...}Bat -> {...}
attackCat -> {...}Fox -> {...}Bat -> {...}

Simulating Inheritance and Type Classes

Inheritance and type classes are both means of coping with the combinatorial explosion of table cells (as n datatypes and m operations on each requires n*m function implementations) by filling in cells automatically.

  • Inheritance provides free methods on your data, depending on what it is.
  • Type classes provide free functions available to any data instantiating a type class.

By making tables programmable - think spreadsheets - we can implement rules to simulate inheritance and type classes.

I’ve been thinking about what it might look like to simulate inheritance or type classes in this framework. Here’s a sketch of the direction I’m going.

We have a few different tables - animal, machine, inheritance, and the animal / machine composite table.

First a couple tables implementing the operations for the parent class.

animal
sound{...}
eat{...}
machine
power up{...}
manufacture{...}

Now a table specifying the inheritance pattern.

animalmachine…
catx
carx
…

Those three tables are the whole implementation. Let’s look at a derived table of all animals and machines together.

animal / machine

catcar
sound(animal)
eat(animal)
power up(machine)
manufacture(machine)

Some benefits:

  • It appeals to my “explicit is better” self to be able to just see what’s going to be invoked, rather than having to understand the rules for method dispatch like in today’s languages.
  • I can click any cell in the table and see:
    • The type of the function.
    • What implementations its neighbors use.
    • If there are any functions in scope with that type signature I could use.
  • I can see patterns in the table.
  • Filling in the table in a programmatic way allows me to use inheritance, typeclasses, or some new way of filling in implementations automatically.

The Expression Problem and Testing

There’s a related problem with testing. Given two orthogonal axes - code and tests - we need to decide how to lay them out. Most code I’ve worked with has had code in one file and tests in another:

// code
function f() {...}
function g() {...}

// tests
function test_f() {...}
function test_g() {...}

Although I have seen code that does this all inline

// f
function f() {...}
function test_f() {...}

// g
function g() {...}
function test_g() {...}

There are good arguments to be made for both styles. It’s more important to note that they’re both valid. Sometimes I want to see f{.javascript} and g{.javascript} at the same time. Sometimes I want to see f{.javascript} and f_test{.javascript} at the same time.

Let’s solve this in the same way as the expression problem

fg
implementationfg
testf_testg_test

Viewing a column I can see code and test together. Viewing a row I can see all code or all tests.

Conclusion

The expression problem is really a layout problem. It’s impossible to write a one-dimensional (plaintext) file such that data and actions are both immediate. Inheritance and typeclasses are useful, but they’re really rules for building an implicit table.

My proposed solution is to change the constraints of the problem. By considering a richer representation of our program we gain the power to view it from multiple directions.