Customizable Datatypes

📅 Friday, August 23, 2019

This post is based on a post here: https://mazzo.li/posts/customizable-data-types.html I suggest reading that post first. It's a very short (but quite interesting) read. I will, however, give a quick summary here.

I recently decided to use this pattern on my pet language project, Kima. I had a problem with the number of parameters on my AST type getting out of hand. I have to make multiple changes to it on the way from parsing to interpretation/compilation and so it needs to be heavily parameterized, nodes have to be enabled/disabled, etc. This pattern allows parameters to be added in an easy way without cluttering up type signatures and has helped clean up a lot of code.

The code here will require the following GHC extensions:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}

Original article

Essentially, the author describes a way to reduce the number of type parameters to a data type to a single "index" type parameter. They do that by introducing a single "index" type parameter as well as a set of type families which operate on that "index".

So for example, this:

data ExprBasic hasIf var where
    IfBasic :: ExprBasic 'True var -> ExprBasic 'True var -> ExprBasic 'True var
        -> ExprBasic 'True var
    VarBasic :: var
        -> ExprBasic hasIf var

becomes this:

-- Full path to a name
newtype QualName = QualName [String]

data Parsed
data Renamed

type family Var idx :: * where
    Var Parsed = String
    Var Renamed = QualName

type family HasIf idx :: Bool where
    HasIf Parsed = 'True
    HasIf Renamed = 'True
-- Maybe some later stage desugars ifs into something else

data Expr idx where
  If :: HasIf idx ~ 'True => Expr idx -> Expr idx -> Expr idx -> Expr idx
  Var :: Var idx -> Expr idx

Benefits

Obviously the second version is a lot more code but it does allow for some interesting differences from the first.

If we add a parameter to Expr we don't have to change every signature where that type appears.
I would argue that it shows intent a lot better than the original. Compare
```
f :: Expr Parsed -> Expr Renamed
```
to
```
f :: Expr 'True String -> Expr 'True QualName
```
You could type synonyms to make the second look more like the first, and it would make the type signatures a lot cleaner but they won't show up in error messages, where you will have to decypher what each type parameter means.
With the closed type families used in the example, we can clearly enumerate all the possible configurations our Expr type could be in.

Using associated type families

The way I use this pattern is a little different from the original is in how I use type families. The original article used closed type families. I have a few problems with this choice.

It doesn't allow adding more stages (although the author does suggest using open type families which would fix this)
It doesn't give a compiler warning if a stage has some necessary variable missing. This will give cryptic errors in the form Var t does not equal String, at the call site because of the way type families are evaluated.
It spreads out the definition of each data type across type family definitions (This could also be fixed by using open type families)

We can fix all of the above by grouping all the type families into a typeclass as associated types like so:

class ExprIndex idx where
  type Var idx :: *
  type HasIf idx :: Bool

instance ExprIndex Parsed where
  type Var Parsed = String
  type HasIf Parsed = True

instance ExprIndex Renamed where
  type Var Parsed = QualName
  type HasIf Parsed = True

I find this much more pleasing to read and it also gives a warning if we miss a a definition for one of the indices.

Warning: Prefer concrete types!

When I first started with this pattern, I tried using polymorphic functions, and only constraining the parameters that would be transformed. For example:

rename
  :: (Var idx2 ~ QualName, HasIf idx1 ~ HasIf idx2)
  => Expr idx1 -> Expr idx2
rename expr = ...

This style works ok for traversals and other simple functions that have no intermediate stages, but in more complicated transformations type inference can be an issue. There will very often be errors involving ambiguous type variables or unresolved type families. Therefore, I suggest leaving the polymorphic types for the functions that will be widely used throughout your application (like traversals) and prefering monomorphic types wherever possible elsewhere.