Jan 21, 2025

Making my life easier with GADTs

In which I showcase one concrete example where dependent types — and more precisely, Generalized Algebraic Datatypes (GADTs) — allow me to turn the handling of different but similar compiler targets into a breeze of fresh air.

Introduction

Dependent types and assimilated type-level features get a bad rep. They are often ~~misrepresented~~ presented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion can stem from a ~~severe~~ misconception about the presumed complexity of dependent type systems. Not only can simple dependent type theories be very intuitive, they are fully opt in. You can write regular “simply-typed” programs in them, and use dependent types only when necessary. Perhaps another explanation for the confusion could be the lack of resources demonstrating the usefulness of such features — when used sparingly — on day-to-day programming tasks.

(I've toned down the wording a bit, as I think it was being unjustifiably harsh on people having reservations about dependent types. There are very valid reasons to stay cautious about this style of programming. Especially in languages with partial support. See this post from Matt Parsons in reply to my article.)

I am by no means a gifted programmer. Competent at the very best. Ever since I’ve been introduced to dependent type systems, it has completely changed the relationship I entertain with my compilers. Types are the language with which I hold conversations with them. I tell them my intentions, and they reply back! Their job is to keep me in check and ensure that I am a man of my word. And it does it pretty well, when I’m talking clear enough.

Today I want to showcase one such time when my typechecker became a very helpful assistant, reminding me to take care of corner cases without fault.

Let’s set the scene. This January, I am interning at IOG, working on a new Agda backend targetting the Lambda Box (λ□) intermediate language. To the uninitiated,

Agda is a dependently-typed programming language.
λ□ is an intermediate language very similar to the Lambda calculus. It was introduced in Pierre Letouzey’s PhD thesis, and is now formally defined as part of the MetaCoq project.

The backend I’m writing is just a program — written in Haskell — that takes as input the internal representation of Agda programs, and outputs λ□ programs. A compiler of sorts.

Given an AgdaProgram, my backend generates a λ□ environment. Simplifying a bit, an environment contains named function and datatype declarations. Here is how I encode those in Haskell.

data GlobalEnv = GlobalEnv [(Name, GlobalDecl)]

data GlobalDecl
  = DataDecl DataBody
  | FunDecl  FunBody

data DataBody = DataBody
  { indConstructors :: [ConstructorBody]
  }

data ConstructorBody = ConstructorBody
  { ctorName :: Name
  , ctorArgs :: Int
  }

data FunBody = FunBody
  { funBody :: LamBox.Term
  }

So a function in the environment stores its expression body — a LamBox.Term — and a datatype declaration tells which constructors come with it, and their respective number of arguments.

Morally, the heart of the backend lies in a function like this:

compile :: AgdaProgram -> IO GlobalEnv

Introducing Typed λ□

There is also a typed variant of λ□, whose environments contain type information. This typed intermediate language enables compilation down to targets such as Rust. In the official formalization of this variant, typed environments are completely distinct from untyped ones. If we did something similar, we would get the following:

data TGlobalEnv = TGlobalEnv [(Name, TGlobalDecl)]

data TGlobalDecl
  = TDataDecl       TDataBody
  | TFunDecl        TFunBody
  | TTypeAliasDecl  TTypeAliasBody

data TDataBody = TDataBody
  { indConstructors :: [TConstructorBody]
  }

data TConstructorBody = TConstructorBody
  { ctorName  :: Name
  , ctorArgs  :: Int
  , ctorTypes :: [LamBox.Type]
  }

data TFunBody = TFunBody
  { funBody :: LamBox.Term
  , funType :: LamBox.Type
  }

type TTypeAliasBody = Lambox.Type

Now the big question: how do I structure the backend to support both compilation targets — typed and untyped λ□ — without hurting my brain?

Pick your poison

I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.

compileUntyped :: AgdaProgram -> IO GlobalEnv
compileTyped   :: AgdaProgram -> IO TGlobalEnv

Another way around would be to only define the typed environment — as it contains strictly more information than its untyped counterpart — but make all type information optional. One simple way to do so is to wrap all the type annotations in Maybes. Then, both compilation targets can be handled by the same program.

data GlobalEnv = GlobalEnv [(Name, GlobalDecl)]

data GlobalDecl
  = DataDecl      DataBody
  | FunDecl       FunBody
  | TypeAliasDecl TypeAliasBody

data DataBody = DataBody
  { indConstructors :: [ConstructorBody]
  }

data ConstructorBody = ConstructorBody
  { ctorName  :: Name
  , ctorArgs  :: Int
  , ctorTypes :: Maybe [LamBox.Type]
  }

data FunBody = FunBody
  { funBody :: LamBox.Term
  , funType :: Maybe LamBox.Type
  }

type TypeAliasBody = Lambox.Type

But here’s the problem: I — the programmer — need to be careful that I don’t forget to put the type information when targetting typed λ□. If I put Nothing where a type signature is expected, my backend will proceed happily. But the generated programs will be incomplete programs, and thus invalid.

How do I make sure that my compiler gets in the way to counter my lack of discipline? How can I have it force me to consistently put the type annotations when we are targetting typed λ□?

Dependent types to the rescue.

Type discipline

I begin by defining the two typing modes my backend supports:

{-# LANGUAGE DataKinds, GADTs #-}

data Typing = Typed | Untyped

Then, I introduce the two targets of the backend. A Target is indexed by the typing mode, which makes the mode we’re in appear at the type level.

import Data.Kind (Type)

data Target :: Typing -> Type where
  ToTyped   :: Target Typed
  ToUntyped :: Target Untyped

I can define a wrapper for values that should be available during the Typed compilation mode.

data WhenTyped :: Typing -> Type -> Type where
  None ::      WhenTyped Untyped a
  Some :: a -> WhenTyped Typed   a

Again, this datatype is indexed by the mode, which means:

None can only construct a value of type WhenTyped t a when t is equal to Untyped.
Some x can only construct a value of type WhenTyped t a when t is equal to Typed.

Or the other way around:

WhenTyped Untyped a has exactly one inhabitant: None.
WhenTyped Typed a has for inhabitants values of the form Some x for x :: a.

When we are in Typed mode, we can always retrieve the wrapped value.

getTyped :: WhenTyped Typed a -> a
getTyped (Some x) = x

Note that the type system is already helping us here! It knows for sure that the function is total. If we add a clause for the None case, it throws a warning:

getTyped None = undefined

Pattern match is redundant

Finally, let’s have a way to run a computation if and only if the target is in type mode:

whenTyped :: Applicative m => Target t -> m a -> m (WhenTyped t a)
whenTyped ToUntyped _ = pure None
whenTyped ToTyped   x = Some <$> x

Now we can go back to defining environments.

A function in the environment always has a body. But it only has a type annotation in Typed mode.

data FunBody t = FunBody
  { funBody :: LamBox.Term
  , funType :: WhenTyped t LamBox.Type
  }

Same thing for datatype constructors:

data DataBody t = DataBody
  { indConstructors :: [ConstructorBody t]
  }

data ConstructorBody t = ConstructorBody
  { ctorName  :: Name
  , ctorArgs  :: Int
  , ctorTypes :: WhenTyped t [LamBox.Type]
  }

Finally, we get environments:

type TypeAliasBody = LamBox.Type

data GlobalEnv t = TGlobalEnv [(Name, GlobalDecl t)]

data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: TypeAliasBody -> GlobalDecl Typed

Note that type alias declarations are only allowed in Typed mode.

Then, our only compile function is updated accordingly:

compile :: Target t -> AgdaProgram -> IO (GlobalEnv t)

The kind of environment we get out is fully determined by the kind of target we get in! If we immediately pattern-match on the target, the Haskell typechecker kindly informs us of the expected return type in each branch:

compile :: Target t -> AgdaProgram -> IO (GlobalEnv t)
compile target prog =
  case target of
    ToUntyped -> _untyped
    ToTyped   -> _typed

• Found hole: _untyped :: IO (GlobalEnv Untyped)
• Found hole: _typed   :: IO (GlobalEnv Typed)

And that’s exactly what I was looking for! It immediately prevents common mistakes, say, like generating a type alias declaration when targetting untyped λ□.

compileAlias :: AgdaDefinition -> IO (GlobalDecl Untyped)
compileAlias def = return $ TypeAliasDecl ...

• Couldn't match type ‘Typed’ with ‘Untyped’
      Expected: IO (GlobalDecl Untyped)
        Actual: IO (GlobalDecl Typed)

What’s more, it allows me to keep most of the compilation logic the same, and only conditionally do more work for the typed target.

For example, when compiling an Agda function to a λ□ function body:

compileTerm :: AgdaTerm -> IO LamBox.Term
compileType :: AgdaTerm -> IO LamBox.Type

compileFunction :: Target t -> AgdaFunction -> IO (FunBody t)
compileFunction target fn = do
  bdy <- compileTerm (funBody fn)
  typ <- whenTyped target $ compileType (funType fn)
  return (FunBody bdy typ)

I never explicitely match on the target, yet compilation of the function type only happens when targetting typed λ□, thanks to our convenient whenTyped utility.

Recall that typ must have the following type:

typ :: WhenTyped t LamBox.Type

When targetting untyped λ□, its value is None and no type compilation happens. Simply wonderful.

Note that using a full-fletched dependently-typed language like Idris2 would make the implementation even simpler! There would be no need for Target, we could use Typing directly at the value level and the type level.

And that’s it for today! I’m not breaking new grounds here, but I feel like the world is missing short and sweet practical examples in which sprinkles of dependent types help you avoid mistakes — for free!

Any feedback appreciated, as always.

I actually do have one thing to add! One common misconception about GADTs is that it’s tedious if not impossible to construct their values, by coming up with the type index out of thin air. But that’s what existential types are for!

In my backend, the target is chosen from a command line flag. Here is the datatype for compilation options:

data Options = forall t. Options
    { optTarget :: Target t
      -- ...
    }

The constructor Options is existentially quantified over t, the target mode. This means that even though the target mode does not show up in the type Options, any time we pattern-match on the constructor Options, the target mode will suddenly become available.

The rest is fairly standard:

defaultOptions :: Options
defaultOptions = Options { optTarget = ToUntyped }

typedOpt :: Flag Options
typedOpt opts = pure opts { optTarget = ToTyped }

run :: Options -> AgdaProgram -> IO ()
run (Options optTarget) prog = do
  env <- compile optTarget prog

  -- do something with the environment
  -- like write it to file

  return ()

Inside run, env has type GlobalEnv t, where t comes from matching on Options. optTarget itself has type Target t. What we cannot do is return the environment, as run couldn’t be typed as is, but that doesn’t prevent us from doing useful things with the compiled environment before exiting!

Now that’s really it. So long! Best.